Internet DRAFT - draft-hoffman-genarea-random-candidate-selection

draft-hoffman-genarea-random-candidate-selection







Network Working Group                                         P. Hoffman
Internet-Draft                                                     ICANN
Intended status: Informational                               29 May 2023
Expires: 30 November 2023


                   Simple Random Candidate Selection
          draft-hoffman-genarea-random-candidate-selection-00

Abstract

   This document describes a process to randomly select a subset of
   candidates from a larger set of candidates.  The process uses an
   unpredictable value can be trusted by all candidates.  It uses
   randomizing based on a hash function to make the description of the
   process easy to understand.

   This draft has a GitHub repository (https://github.com/paulehoffman/
   draft-hoffman-genarea-random-candidate-selection).  Issues and pull
   requests can be made there.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 30 November 2023.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components



Hoffman                 Expires 30 November 2023                [Page 1]

Internet-Draft             Candidate Selection                  May 2023


   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Earlier Work  . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Overview of the Process . . . . . . . . . . . . . . . . . . .   3
     2.1.  Basic Steps . . . . . . . . . . . . . . . . . . . . . . .   3
   3.  Specifics for the Process . . . . . . . . . . . . . . . . . .   4
     3.1.  Start of Ceremony . . . . . . . . . . . . . . . . . . . .   4
     3.2.  Name Submission . . . . . . . . . . . . . . . . . . . . .   5
     3.3.  Closing Submissions . . . . . . . . . . . . . . . . . . .   5
     3.4.  Selecting _D_ . . . . . . . . . . . . . . . . . . . . . .   6
     3.5.  Calculating Hashes  . . . . . . . . . . . . . . . . . . .   6
     3.6.  Selecting _S_ Candidates  . . . . . . . . . . . . . . . .   6
   4.  Handling Process Issues . . . . . . . . . . . . . . . . . . .   6
   5.  Sample Code . . . . . . . . . . . . . . . . . . . . . . . . .   7
   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  10
   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  10
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  11
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  11
     8.2.  Informative References  . . . . . . . . . . . . . . . . .  11
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  11

1.  Introduction

   It is common to need to pick a subset of people from a larger group
   using a random selection method.  This is often done on an ad hoc
   basis, but for some selections, a more formal process is needed,
   particularly if the people in the larger group don't all trust the
   administrator of the selection process to be unbiased.

   This document gives a simple, understandable process that can be done
   for groups and subsets of arbitrary size.  The process is purposely
   transparent and reproducible.  It works with any group of entities
   that nave names: people, companies, locations, dates ("18 August
   2022"), and so on.

   As a simple example, a future leadership committee will have a fixed
   size.  The members of the committee will be selected from a large
   pool of volunteers.  Someone is in charge of collecting the names of
   the volunteers and making a randomized selection among them for the
   leadership committee.  They can use the process in this document to
   make that selection in a way that is both provably random and
   understandable.




Hoffman                 Expires 30 November 2023                [Page 2]

Internet-Draft             Candidate Selection                  May 2023


   Due to the formatting used in this document, the reader is encouraged
   to read the HTML version, although the text version is still usable.

1.1.  Earlier Work

   RFC 3797 [NomCom] defines the process that the IETF uses to select
   the NomCom each year.  It uses a random input to select from a list
   of candidate names in ASCII using modulo aritmetic the position of
   the names in alphabetical order.  That process has some significant
   pitfalls described in the document updating RFC 3797, [NomCom-bis].
   The process in this document uses a very different selection process
   to avoid those pitfalls.

   After this document has passed IETF review, the IETF might consider
   using the process described here as a replacement for the process in
   RFC 3797, but the document is valuable even if the IETF does not want
   to change the NomCom process.

2.  Overview of the Process

   A few terms are used throughout this document:

   ceremony:  The act of collecting names, making the random selection,
      and publishing the entire process.

   ceremony administrator (CA):  The person who performs the steps of
      the ceremony.

   candidate:  The person, organization, or other namable entity that is
      possibly being selected during the ceremony.

   candidate name:  The name selected by each candidate for the
      selection process.  The candidate name is expressed as a string of
      Unicode characters [Unicode] in UTF-8 format [UTF-8].

   difficult-to-predict number (_D_):  A publicly-visible number that is
      not known before the pool of candidates has been closed.  Note
      that this is different from what is normally called a "random
      number".  True random numbers are designed to be nearly impossible
      to predict, whereas _D_ in this process has weak (but sufficient)
      randomness.

2.1.  Basic Steps

   The steps in a ceremony that follows this process is given here.  See
   Section 3 for more detail on the steps.





Hoffman                 Expires 30 November 2023                [Page 3]

Internet-Draft             Candidate Selection                  May 2023


   1.  The CA starts the ceremony by performing the following steps at
       the same time:

       *  Announces an end date for when the pool will be complete.

       *  Announces a later date on which the difficult-to-predict
          number, _D_, will be selected.

       *  Announces the source where _D_ will be found.

       *  Announces the number of candidates that will be selected
          (called _S_).

       *  Opens up the pool of candidates for submission.

   2.  Candidates submit their names to the pool until the closing date.

   3.  On the closing date, the CA publishes the set of candidate names
       with the hexadecimal value of the UTF-8 encoding for each
       candidate name.

   4.  On the date for selecting _D_, the CA gets _D_ from the announced
       source.

   5.  The CA calculates the hashes used to make the selection.  They
       concatenate each candidate name with _D_ (name first, then _D_),
       uses the SHA-256 hash function [SHA-2] on the new string, and
       records the value of the output as a UTF-8 string.

   6.  The CA arranges the set of hash values in alphabetic order from
       highest to lowest.  They then select the _S_ candidates from the
       top of the list (that is, the names whose hash values are
       largest).

3.  Specifics for the Process

3.1.  Start of Ceremony

   Much of the trust in the selection process is based on the CA not
   being able to influence the selection.  If the CA can choose, or
   influence, the value of _D_, they can establish the ordering of the
   differences of the integers.  Similarly, if one or more of the
   candidates can influence the value of _D_, they can increase their
   chance of being selected.

   To make the process trustworthy, the value of _D_ must be unrelated
   to the CA or the candidates, and it must be selected after the list
   of candidates is completed There are many sources of such values:



Hoffman                 Expires 30 November 2023                [Page 4]

Internet-Draft             Candidate Selection                  May 2023


   stock market closing values, numbers chosen for large public
   lotteries, and so on.  Section 3.1 of RFC 3797 lists many such
   sources.

   The most important things for a ceremony is that the source is
   announced before the ceremony starts, that all participants and
   viewers of a ceremony can find the source on the date specified by
   the CA, and that everyone gets the same value when they go to the
   source for that day.

   If the CA chooses to use stock market closing values, a common open
   source of those values is the Wall Street Journal.  For example, the
   FTSE 100 Index is a long-established index based on 100 stocks; it is
   sometimes known by its stock ticker as "UXK".  The daily closing for
   the FTSE 100 Index at the Wall Street Journal can currently be found
   here (https://www.wsj.com/market-data/quotes/index/UK/UKX/historical-
   prices).

   Note that the location for sources of daily closing values can change
   over time.  The CA must check that the source they intende to use is
   still active, and still available when the ceremony starts.

3.2.  Name Submission

   The CA is the sole arbitrator for whether a candidate is allowed to
   enter the pool.  The CA is also the sole arbitrator of what name
   string (in UTF-8) the candidate can use in the pool.

   The order that the candidates join the pool does not affect the
   outcome of the selection process.  Said another way, the pool is kept
   as an unordered set of candidates, not an ordered list of candidates.

3.3.  Closing Submissions

   At the closing of submissions, the CA verifies that the length of the
   set of candidates in the pool is larger than _S_.  If the length is
   the same as _S_, the rest of the steps are unneeded (and could be
   confusing), because all candidates will automatically be selected.
   If the length is shorter than _S_, the ceremony stops because there
   are too few candidates.

   The method for publishing the set of candidates is determined by the
   CA.  Figure 3 gives an example of how a CA might publish this
   information.







Hoffman                 Expires 30 November 2023                [Page 5]

Internet-Draft             Candidate Selection                  May 2023


3.4.  Selecting _D_

   On the day that the CA announced for the selection of _D_, the CA
   goes the the source they announced and gets _D_. They encode _D_ as a
   UTF-8 string, which is fairly easy if the source is numeric.  In the
   example of the FTSE 100 from Figure 4, a closing value for the day
   announced at the beginning of the ceremony might be "7623.10".  This
   would be encoded in UTF-8 as the string of characters whose value is
   U+373632372e3130.

3.5.  Calculating Hashes

   Different programming libraries have different requirements for the
   input to hash functions.  Section 5 uses the built-in hashlib library
   in Python, which requires that text strings come with a specified
   encoding.

3.6.  Selecting _S_ Candidates

   The process of selecting is simply taking the _S_ candidates whose
   hash value is.  This can easily be determined by sorting the text
   representation of the hash values because in UTF-8 and ASCII, digits
   have lower codepoints than letters.

   To complete the process, the CA should publish all known data for the
   ceremony.  This includes _S_, _D_, the hexadecimal value of _D_, all
   of the information for each candidate, and the full list of selected
   candidates.  Figure 5 shows an example of what this publication might
   look like.

4.  Handling Process Issues

   Ceremonies don't always go as planned.  For example, after a ceremony
   completes, one or more of the selected candidates might be removed
   from the selected set due to voluntary withdrawal or established
   rules (such as no two candidates being from the same geographic
   region).  In such cases, no new ceremony is needed: the CA simply
   selects the next candidate(s) on the list that is ordered by hash
   values.

   (Note that the process in this document is quite different than that
   in [NomCom] and [NomCom-bis], where unselecting an already-selected
   candidate can cause a new random selection for all the remaining
   candidates.  Such reestablishing of the candidate pool can cause
   confusion and hurt among the candidates who were first selected, then
   later unselected, due to no fault of the their own.)





Hoffman                 Expires 30 November 2023                [Page 6]

Internet-Draft             Candidate Selection                  May 2023


   Similarly, if after the selection process is completed, the size _S_
   of the selected set needs to increase, the CA simply selects the next
   candidate(s) on the list that is ordered by hash values.

5.  Sample Code

   The following is a list of figures for an implementation of the
   procedure shown in this document.

   *  The Python script in Figure 1 implements the algorithm from this
      document.

   *  The file that contains the list of names is shown in Figure 2.
      (The names are the winners of the Nobel laureates in Literature
      for 2016 through 2021.)

   *  A file showing the UTF-8 representation of the names from Figure 2
      is shown in Figure 3.  This file is suitable for showing to the
      candidates.

   *  The file that contains the _S_ and _D_ on separate lines is shown
      in Figure 4.

   *  Figure 5 shows the result of running the program with that file as
      input.

   #!/usr/bin/env python3

   # Program to randomly select some candidates from a group
   #  See draft-hoffman-genarea-reandom-candidate-selection

   import hashlib, sys
   from pathlib import Path

   # Helper function to turn a UTF-8 string into its hex representation
   def hexify(in_str):
     return "".join([hex(c)[2:] for c in in_str.encode("utf8")])

   # Santity check the input files given on the command line
   if len(sys.argv) == 1:
     exit("Must give the name of the candidate file, and possibly " + \
       "the selection file, on the command line. Exiting.")
   candidate_path = Path(sys.argv[1])
   if not candidate_path.exists():
     exit(f"The file {str(candidate_path)} does not exist. Exiting.")
   try:
     file_contents = candidate_path.open(mode="rt", encoding="utf8")
   except:



Hoffman                 Expires 30 November 2023                [Page 7]

Internet-Draft             Candidate Selection                  May 2023


     exit("The candidates file do not appear to be in UTF-8. Exiting.")
   candidate_lines = file_contents.read().splitlines()
   # See if there is a second file for selecting
   if len(sys.argv) == 3:
     run_including_selection = True
     selection_path = Path(sys.argv[2])
     if not selection_path.exists():
       exit(f"The file {str(selection_path)} does not exist. Exiting.")
     try:
       file_contents = selection_path.open(mode="rt", encoding="utf8")
     except:
       exit("The selection file does not appear to be UTF-8. Exiting.")
     selection_lines = file_contents.read().splitlines()
     # Extract D and S from the selection file
     S_str = selection_lines[0]
     try:
       S = int(S_str)
     except:
       print(f"The first line of the selection file, '{S_str}', " + \
         "is not an integer. Exiting.")
     D_str = selection_lines[1]
     D_hex = hexify(D_str)
   else:
     run_including_selection = False

   # Get the candidates information
   C_info = []
   c_names = candidate_lines
   diff_set = set()
   for C_str in c_names:
     C_hex = "".join([hex(c)[2:] for c in C_str.encode("utf8")])
     if run_including_selection:
       C_with_D_str = C_str + D_str
       C_with_D_hex = hexify(C_with_D_str)
       C_with_D_hash = hashlib.sha256(C_with_D_hex.encode("utf-8"))
       C_info.append([C_str, C_hex, C_with_D_str, C_with_D_hex, \
         C_with_D_hash.hexdigest()])
     else:
       C_info.append([C_str, C_hex])

   # Print the results
   if run_including_selection:
     print(f"S is {S}")
     print(f"D is \"{D_str}\"")
     print(f" U+{D_hex}\n")
     print("Candidate information, sorted by hash of name including D")
     selected = []
     for this_info in sorted(C_info, key=lambda a: a[4], reverse=True):



Hoffman                 Expires 30 November 2023                [Page 8]

Internet-Draft             Candidate Selection                  May 2023


       if S > 0:
         selected.append(this_info[0])
         S -= 1
       print(f"{this_info[2]}")
       print(f" U+{this_info[3]}")
       print(f" {this_info[4]}")
     print("\nSelected:\n    " + "\n    ".join(selected))
   else:
     for this_info in C_info:
       print(f"{this_info[0]}")
       print(f" U+{this_info[1]}")

              Figure 1: Example Python code for this procedure

   Bob Dylan
   石黒 一雄
   Olga Tokarczuk
   Peter Handke
   Louise Glück
   Abdulrazak Gurnah

                      Figure 2: Sample name list file

   Bob Dylan
    U+426f622044796c616e
   石黒 一雄
    U+e79fb3e9bb9220e4b880e99b84
   Olga Tokarczuk
    U+4f6c676120546f6b6172637a756b
   Peter Handke
    U+50657465722048616e646b65
   Louise Glück
    U+4c6f7569736520476cc3bc636b
   Abdulrazak Gurnah
    U+416264756c72617a616b204775726e6168

                  Figure 3: Full information for the names

   3
   7623.10

                Figure 4: Sample selection information file









Hoffman                 Expires 30 November 2023                [Page 9]

Internet-Draft             Candidate Selection                  May 2023


   S is 3
   D is "7623.10"
    U+373632332e3130

   Candidate information, sorted by hash of name including D
   石黒 一雄7623.10
    U+e79fb3e9bb9220e4b880e99b84373632332e3130
    f2e0d3bbd8eac635d799702bead0fbdf07ff79ef94a261789de50e81adb38a13
   Louise Glück7623.10
    U+4c6f7569736520476cc3bc636b373632332e3130
    a54e282cbaa1f29543cd13d9a29e07e3a38413360172b722f8259c2baa3c38dd
   Peter Handke7623.10
    U+50657465722048616e646b65373632332e3130
    8bb3bc197c6462b033e4d8e8cf703b13b1c55172572a85d56c639db5c57d3866
   Olga Tokarczuk7623.10
    U+4f6c676120546f6b6172637a756b373632332e3130
    56166c4e0e6ca027f4150bac5ce83fbf5652e440214fd255308472fed9f8fb1b
   Abdulrazak Gurnah7623.10
    U+416264756c72617a616b204775726e6168373632332e3130
    340413dc6b2574f5ddc5e88e1c986a229d9defccbae249789b07a5d2337981ff
   Bob Dylan7623.10
    U+426f622044796c616e373632332e3130
    05eb403f4f59f5a7b21f5e5a4e8dbfbac59344fd5e8708ab618b5e2ed27a52de

   Selected:
       石黒 一雄
       Louise Glück
       Peter Handke

      Figure 5: Output of running the program on the list of names and
                           selection information

6.  IANA Considerations

   This document has no IANA considerations.

7.  Security Considerations

   The value _D_ used in this process is explicitly not
   cryptographically strong; in fact, it might provide only a few bits
   of randomness.  For example, stock indexes that contain many stocks
   might be predictable after the third digit from the right, meaning
   that they only have randomness of about 10 bits.  A candidate who has
   a lot of leeway in choosing their name can possibly increase their
   chance of being selected by as much as 0.1% with such source of
   randomness.  If the CA feels that candidates have too much leeway in
   selecting their names and is concerned about candidates gaming the
   ceremony even to that tiny extent, that CA needs to choose a source



Hoffman                 Expires 30 November 2023               [Page 10]

Internet-Draft             Candidate Selection                  May 2023


   for _D_ with more randomness.

   Such selection is outside the scope of this document because it would
   make the process more complicated or less understandable.  Instead,
   this document relies on the CA to only allow sensible representation
   of candidate names and to accept a tiny chance that candidates can
   both predict _D_ and be able to change the name they use to reflect
   that predicted value.

8.  References

8.1.  Normative References

   [SHA-2]    Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms
              (SHA and SHA-based HMAC and HKDF)", RFC 6234,
              DOI 10.17487/RFC6234, May 2011,
              <https://www.rfc-editor.org/rfc/rfc6234>.

   [Unicode]  The Unicode Consortium, "The Unicode Standard (latest
              version)", n.d.,
              <https://www.unicode.org/versions/latest/>.

   [UTF-8]    Yergeau, F., "UTF-8, a transformation format of ISO
              10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
              2003, <https://www.rfc-editor.org/rfc/rfc3629>.

8.2.  Informative References

   [NomCom]   Eastlake 3rd, D., "Publicly Verifiable Nominations
              Committee (NomCom) Random Selection", RFC 3797,
              DOI 10.17487/RFC3797, June 2004,
              <https://www.rfc-editor.org/rfc/rfc3797>.

   [NomCom-bis]
              Eastlake, D. E., "Publicly Verifiable Nominations
              Committee (NomCom) Random Selection", Work in Progress,
              Internet-Draft, draft-eastlake-rfc3797bis-02, 16 April
              2023, <https://datatracker.ietf.org/doc/html/draft-
              eastlake-rfc3797bis-02>.

Author's Address

   Paul Hoffman
   ICANN
   Email: paul.hoffman@icann.org






Hoffman                 Expires 30 November 2023               [Page 11]