Benchmarking Working Group                            M. Konstantynowicz
Internet-Draft                                                  V. Polak
Intended status: Informational                             Cisco Systems
Expires: 11 January 2024                                    10 July 2023


                       Multiple Loss Ratio Search
                      draft-ietf-bmwg-mlrsearch-04

Abstract

   This document proposes improvements to [RFC2544] throughput search by
   defining a new methodology called Multiple Loss Ratio search
   (MLRsearch).  The main objectives for MLRsearch are to minimize the
   total test duration, search for multiple loss ratios and improve
   results repeatibility and comparability.

   The main motivation behind MLRsearch is the new set of challenges and
   requirements posed by testing Network Function Virtualization (NFV)
   systems and other software based network data planes.

   MLRsearch offers several ways to address these challenges, giving
   user configuration options to select their preferred way.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 11 January 2024.

Copyright Notice

   Copyright (c) 2023 IETF Trust and the persons identified as the
   document authors.  All rights reserved.


Konstantynowicz & Polak  Expires 11 January 2024                [Page 1]

Internet-Draft                  MLRsearch                      July 2023


   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components
   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Purpose and Scope . . . . . . . . . . . . . . . . . . . . . .   3
   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
     2.1.  General notions . . . . . . . . . . . . . . . . . . . . .   4
       2.1.1.  General and specific quantities . . . . . . . . . . .   4
       2.1.2.  Composite . . . . . . . . . . . . . . . . . . . . . .   5
       2.1.3.  SUT . . . . . . . . . . . . . . . . . . . . . . . . .   5
       2.1.4.  Trial . . . . . . . . . . . . . . . . . . . . . . . .   6
       2.1.5.  Load  . . . . . . . . . . . . . . . . . . . . . . . .   6
       2.1.6.  Duration  . . . . . . . . . . . . . . . . . . . . . .   6
       2.1.7.  Duration sum  . . . . . . . . . . . . . . . . . . . .   7
       2.1.8.  Width . . . . . . . . . . . . . . . . . . . . . . . .   7
       2.1.9.  Loss ratio  . . . . . . . . . . . . . . . . . . . . .   8
       2.1.10. Exceed ratio  . . . . . . . . . . . . . . . . . . . .   8
     2.2.  Architecture  . . . . . . . . . . . . . . . . . . . . . .   8
       2.2.1.  Manager . . . . . . . . . . . . . . . . . . . . . . .   9
       2.2.2.  Measurer  . . . . . . . . . . . . . . . . . . . . . .   9
       2.2.3.  Controller  . . . . . . . . . . . . . . . . . . . . .  11
       2.2.4.  Controller input  . . . . . . . . . . . . . . . . . .  12
       2.2.5.  Controller internals  . . . . . . . . . . . . . . . .  15
       2.2.6.  Controller output . . . . . . . . . . . . . . . . . .  26
   3.  Problems  . . . . . . . . . . . . . . . . . . . . . . . . . .  26
     3.1.  Long Test Duration  . . . . . . . . . . . . . . . . . . .  26
     3.2.  DUT within SUT  . . . . . . . . . . . . . . . . . . . . .  27
     3.3.  Repeatability and Comparability . . . . . . . . . . . . .  28
     3.4.  Throughput with Non-Zero Loss . . . . . . . . . . . . . .  29
     3.5.  Inconsistent Trial Results  . . . . . . . . . . . . . . .  30
   4.  How the problems are addressed  . . . . . . . . . . . . . . .  30
   5.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  31
   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  31
   7.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  32
   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  32
     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  32
     8.2.  Informative References  . . . . . . . . . . . . . . . . .  32
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  32


Konstantynowicz & Polak  Expires 11 January 2024                [Page 2]

Internet-Draft                  MLRsearch                      July 2023


1.  Purpose and Scope

   The purpose of this document is to describe Multiple Loss Ratio
   search (MLRsearch), a throughput search methodology optimized for
   software DUTs.

   Applying vanilla [RFC2544] throughput bisection to software DUTs
   results in a number of problems:

   *  Binary search takes too long as most of trials are done far from
      the eventually found throughput.

   *  The required final trial duration (and pauses between trials) also
      prolong the overall search duration.

   *  Software DUTs show noisy trial results (noisy neighbor problem),
      leading to big spread of possible discovered throughput values.

   *  Throughput requires loss of exactly zero packets, but the industry
      frequently allows for small but non-zero losses.

   *  The definition of throughput is not clear when trial results are
      inconsistent.

   MLRsearch aims to address these problems by applying the following
   set of enhancements:

   *  Allow searching for multiple search goals, with differing goal
      loss ratios.

      -  Each trial result can affect any search goal in principle
         (trial reuse).

   *  Multiple preceding targets for each search goal, earlier ones need
      to spend less time on trials.

      -  Earlier targets also aim at lesser precision.

      -  Use Forwarding Rate (FR) at maximum offered load [RFC2285]
         (section 3.6.2) to initialize the initial targets.

   *  Take care when dealing with inconsistent trial results.

      -  Loss ratios goals are handled in an order that minimizes the
         chance of interference from later trials to earlier goals.

   *  Apply several load selection heuristics to save even more time by
      trying hard to avoid unnecessarily narrow bounds.


Konstantynowicz & Polak  Expires 11 January 2024                [Page 3]

Internet-Draft                  MLRsearch                      July 2023


   MLRsearch configuration options are flexible enough to support both
   conservative settings (unconditionally compliant with [RFC2544], but
   longer search duration and worse repeatability) and aggressive
   settings (shorter search duration and better repeatability but not
   compliant with [RFC2544]).

   No part of [RFC2544] is intended to be obsoleted by this document.

2.  Terminology

   When a subsection is defining a term, the first paragraph acts as a
   definition.  Other paragraphs are treated as a description, they
   provide additional details without being needed to define the term.

   Definitions should form a directed acyclic graph of dependencies.  If
   a section contains subsections, the section definition may depend on
   the subsection definitions.  Otherwise, any definition may depend on
   preceding definitions.  In other words, if the section definition
   were to come after subsections, there would be no forward
   dependencies for people reading just definitions from start to
   finish.

   Descriptions provide motivations and explanations, they frequently
   reference terms defined only later.  Motivations in section
   descriptions are the reason why section text comes before subsection
   text.

2.1.  General notions

   General notions are the terms defined in this section.

   It is useful to define the following notions before delving into
   MLRsearch architecture, as the notions appear in multiple places with
   no place being special enough to host definition.

2.1.1.  General and specific quantities

   General quantity is a quantity that may appear multiple times in
   MLRsearch specification, perhaps each time in a different role.  The
   quantity when appearing in a single role is called a specific
   quantity.

   It is useful to define the general quantity, so definitions of
   specific quantities may refer to it.  We say a specific quantity is
   based on a general quantity, if the specific quantity definition
   refers to and relies on the general quantity definition.


Konstantynowicz & Polak  Expires 11 January 2024                [Page 4]

Internet-Draft                  MLRsearch                      July 2023


   It is natural to name specific quantities by adding an adjective (or
   a noun) to the name of the general quantity.  But existing RFCs
   typically explicitly define a term acting in a specific role, so the
   RFC name directly refers to a specific quantity, while the
   corresponding general quantity is defined only implicitly.  Therefore
   this documents defines general quantities explicitly, even if the
   same term already appears in an RFC.

   In practice, it is required to know which unit of measurement is used
   to accompany a numeric value of each quantity.  The choice of a
   particular unit of measurement is not important for MLRsearch
   specification though, so specific units mentioned in this document
   are just examples or recommendations, not requirements.

   When reporting, it is REQUIRED to state the units used.

2.1.2.  Composite

   A composite is a set of named attributes.  Each attribute is either a
   specific quantity or a composite.

   MLRsearch specification frequently groups multiple specific
   quantities into a composite.  Description of such a composite brings
   an insight to motivations why this or other terms are defined as they
   are.  Such insight will be harder to communicate with the specific
   quantities alone.

   Also, it simplifies naming of specific quantities, as they usually
   can share a noun or adjective referring to their common composite.
   Most of relations between composites and their specific quantities
   can be described using plain English.

   Perhaps the only exception involves referring to specific quantities
   as attributes.  For example if there is a composite called 'target',
   and one of its specific quantities is 'target width' defined using a
   general quantity 'width', we can say 'width is one of target
   attributes'.

2.1.3.  SUT

   As defined in RFC 2285: The collective set of network devices to
   which stimulus is offered as a single entity and response measured.

   While RFC 2544 mostly refers to DUT as a single (network
   interconnecting) device, section 19 makes it clear multiple DUTs can
   be treated as a single system, so most of RFC 2544 also applies to
   testing SUT.


Konstantynowicz & Polak  Expires 11 January 2024                [Page 5]

Internet-Draft                  MLRsearch                      July 2023


   MLRsearch specification only refers to SUT (not DUT), even if it
   consists of just a single device.

2.1.4.  Trial

   A trial is the part of test described in RFC 2544 section 23.

   When traffic has been sent and SUT response has been observed, we say
   the trial has been performed, or the trial has been measured.  Before
   that happens, multiple possibilities for upcoming trial may be under
   consideration.

2.1.5.  Load

   Intended, constant load for a trial, usually in frames per second.

   Load is the general quantity implied by Constant Load of RFC 1242,
   Data Rate of RFC 2544 and Intended Load of RFC 2285.  All three
   specify this value applies to one (input or output) interface, so we
   can talk about unidirectional load also when bidirectional or multi-
   port traffic is applied.

   MLRsearch does not rely on this distinction, it works also if the
   load values correspond to an aggregate rate (sum over all SUT tested
   input or output interface unidirectional loads), as long as all loads
   share the same semantics.

   Several RFCs define useful quantities based on Offered Load (instead
   of Intended Load), but MLRsearch specification works only with
   (intended) load.  Those useful quantities still serve as motivations
   for few specific quantities used in MLRsearch specification.

   MLRsearch assumes most load values are positive.  For some (but not
   all) specific quantities based on load, zero may also be a valid
   value.

2.1.6.  Duration

   Intended duration of the traffic for a trial, usually in seconds.

   This general quantity does not include any preparation nor waiting
   described in section 23 of RFC 2544.  Section 24 of RFC 2544 places
   additional restrictions on duration, but those restriction apply only
   to some of the specific quantities based on duration.

   Duration is always positive in MLRsearch.


Konstantynowicz & Polak  Expires 11 January 2024                [Page 6]

Internet-Draft                  MLRsearch                      July 2023


2.1.7.  Duration sum

   For a specific set of trials, this is the sum of their durations.

   Some of specific quantities based on duration sum are derived
   quantities, without a specific set of trials to sum their durations.

   Duration sum is never negative in MLRsearch.

2.1.8.  Width

   General quantity defined for an ordered pair (lower and higher) of
   load values, which describes a distance between the two values.

   The motivation for the name comes from binary search.  The binary
   search tries to approximate an unknown value by repeatedly bisecting
   an interval of possible values, until the interval becomes narrow
   enough.  Width of the interval is a specific quantity and the
   termination condition compares that to another specific quantity
   acting as the threshold.  The threshold value does not have a
   specific interval associated, but corresponds to a 'size' of the
   compared interval.  As size is a word already used in definition of
   frame size, a more natural word describing interval is width.

   The MLRsearch specification does use (analogues of) upper bound and
   lower bound, but does not actually need to talk about intervals.
   Still, the intervals are implicitly there, so width is the natural
   name.

   Actually, there are two popular options for defining width.  Absolute
   width is based on load, the value is the higher load minus the lower
   load.  Relative width is dimensionless, the value is the absolute
   width divided by the higher load.  As intended loads for trials are
   positive, relative width is between 0.0 (including) and 1.0
   (excluding).

   Relative width as a threshold value may be useful for users who do
   not presume what is the typical performance of SUT, but absolute
   width may be a more familiar concept.

   MLRsearch specification does not prescribe which width has to be
   used, but widths MUST be either all absolute or all relative, and it
   MUST be clear from report which option was used (it is implied from
   the unit of measurement of any width value).


Konstantynowicz & Polak  Expires 11 January 2024                [Page 7]

Internet-Draft                  MLRsearch                      July 2023


2.1.9.  Loss ratio

   The loss ratio is a general quantity, dimensionless floating point
   value assumed to be between 0.0 and 1.0, both including.  It is
   computed as the number of frames forwarded by SUT, divided by the
   number of frames that should have been forwarded during the trial.

   If the number of frames that should have been forwarded is zero, the
   loss ratio is considered to be zero (but it is better to use high
   enough loads to prevent this).

   Loss ratio is basically the same quantity as Frame Loss Rate of RFC
   1242, just not expressed in percents.

   RFC1242 Frame Loss Rate: Percentage of frames that should have been
   forwarded by a network device under steady state (constant) load that
   were not forwarded due to lack of resources.

   (RFC2544 restricts Frame Loss Rate to a type of benchmark, for loads
   100% of 'maximum rate', 90% and so on.)

2.1.10.  Exceed ratio

   This general quantity is a dimensionless floating point value,
   defined using two duration sum quantities.  One duration sum is
   referred to as the good duration sum, the other is referred to as the
   bad duration sum.  The exceed ratio value is computed as the bad
   duration sum value divided by the sum of the two sums.  If both sums
   are zero, the exceed ratio is undefined.

   As there are no negative duration sums in MLRsearch, exceed ratio
   values are between 0.0 and 1.0 (both including).

2.2.  Architecture

   MLRsearch architecture consists of three main components: the
   manager, the controller and the measurer.

   The search algorithm is implemented in the controller, and it is the
   main focus of this document.

   Most implementation details of the manager and the measurer are out
   of scope of this document, except when describing how do those
   components interface with the controller.


Konstantynowicz & Polak  Expires 11 January 2024                [Page 8]

Internet-Draft                  MLRsearch                      July 2023


2.2.1.  Manager

   The manager is the component that initializes SUT, traffic generator
   (called tester in RFC 2544), the measurer and the controller with
   intended configurations.  It then handles the execution to the
   controller and receives its result.

   Managers can range from simple CLI utilities to complex Continuous
   Integration systems.  From the controller point of view it is
   important that no additional configuration (nor warmup) is needed for
   SUT and the measurer to perform trials.

   The interface between the manager and the controller is defined in
   the controller section.

   One execution of the controller is called a search.  Some benchmarks
   may execute multiple searches on the same SUT (for example when
   confirming the performance is stable over time), but in this document
   only one invocation is concerned (others may be understood as the
   part of SUT preparation).

   Creation of reports of appropriate format can also be understood as
   the responsibility of the manager.  This document places requirements
   on which information has to be reported.

2.2.2.  Measurer

   The measurer is the component which performs one trial as described
   in RFC 2544 section 23, when requested by the controller.

   From the controller point of view, it is a function that accepts
   trial input and returns trial output.

   This is the only way the controller can interact with SUT.  In
   practice, the measurer has to do subtle decisions when converting the
   observed SUT behavior into a single trial loss ratio value.  For
   example how to deal with out of order frames or duplicate frames.

   On software implementation level, the measurer is a callable,
   injected by the manager into the controller instance.

   The act of performing one trial (act of turning trial input to trial
   output) is called a measurement, or trial measurement.  This way we
   can talk about trials that were measured already and trials that are
   merely planned (not measured yet).


Konstantynowicz & Polak  Expires 11 January 2024                [Page 9]

Internet-Draft                  MLRsearch                      July 2023


2.2.2.1.  Trial input

   The load and duration to use in an upcoming trial.

   This is a composite.

   Other quantities needed by the measurer are assumed to be constant
   and set up by the manager before search starts (see traffic profile),
   so they do not count as trial input attributes.

2.2.2.1.1.  Trial load

   Trial load is the intended load for the trial.

   This is a specific quantity based on load, directly corresponding to
   RFC 2285 intended load.

2.2.2.1.2.  Trial duration

   Trial duration is the intended duration for the trial.

   This is a specific quantity based on duration, so it specifies only
   the traffic part of the trial, not the waiting parts.

2.2.2.2.  Traffic profile

   Any other configuration values needed by the measurer to perform a
   trial.

   The measurer needs both trial input and traffic profile to perform
   the trial.  As trial input contains the only values that vary during
   one the search, traffic profile remains constant during the search.

   Traffic profile when understood as a composite is REQUIRED by RFC
   2544 to contain some specific quantities (for example frame size).
   Several more specific quantities may be RECOMMENDED.

   Depending on SUT configuration (e.g. when testing specific
   protocols), additional values need to be included in the traffic
   profile and in the test report.  (See other IETF documents.)

2.2.2.3.  Trial ouput

   A composite consisting of trial loss ratio and trial forwarding rate.

   Those are the only two specific quantities (among other quantities
   possibly measured in the trial, for example offered load) that are
   important for MLRsearch.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 10]

Internet-Draft                  MLRsearch                      July 2023


2.2.2.3.1.  Trial loss ratio

   Trial loss ratio is a specific quantity based on loss ratio.  The
   value is related to a particular measured trial, as measured by the
   measurer.

2.2.2.3.2.  Trial forwarding rate

   Trial forwarding rate is a derived quantity.  It is computed as one
   minus trial loss ratio, that multiplied by trial load.

   Despite the name, the general quantity this specific quantity
   corresponds to is load (not rate).  The name is inspired by RFC 2285,
   which defines Forwarding Rate specific to one output interface.

   As the definition of loss ratio is not neccessarily per-interface
   (one of details left for the measurer), using the definition above
   (instead of RFC 2285) makes sure trial forwarding rate is always
   between zero and the trial load (both including).

2.2.2.4.  Trial result

   Trial result is a composite consisting of trial input attributes and
   trial output attributes.

   Those are all specific quantites related to a measured trial
   MLRsearch needs.

   While distinction between trial input and output is important when
   defining the interface between the controller and the measurer, it is
   easier to talk about trial result when describing how measured trials
   influence the controller behavior.

2.2.3.  Controller

   The component of MLRsearch architecture that calls the measurer and
   returns conditional throughputs to the manager.

   This component implements the search algorithm, the main content of
   this document.

   Contrary to Throughput as defined in RFC 1242, the definition of
   conditional throughput is quite sensitive to the controller input (as
   provided by the manager), and its full definition needs several terms
   which would otherwise be hidden as internals of the controller
   implementation.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 11]

Internet-Draft                  MLRsearch                      July 2023


   The ability of conditional throughput to be less sensitive to
   performance variance, and the ability of the controller to find
   conditional throughputs for multiple search goals within one search
   (and in short overall search time) are strong enough motivations for
   the need of increased complexity.

2.2.4.  Controller input

   A composite of max load, min load, and a set of search goals.

   The search goals (as elements of the set of search goals) are usually
   not named and unordered.

   It is fine if all search goals of the set have the same value of a
   particular attribute.  In that case, the common value may be treated
   as a global attribute (similarly to max and min load).

   The set of search goals MUST NOT be empty.  Two search goals within
   the set MUST differ in at least one attribute.  The manager MAY avoid
   both issues by presenting empty report or de-duplicating the search
   goals, but it is RECOMMENDED for the manager to raise an error to its
   caller, as the two conditions suggest the test is improperly
   configured.

2.2.4.1.  Max load

   Max load is a specific quantity based on load.  No trial load is ever
   higher than this value.

   RFC 2544 section 20 defines maximum frame rate based on theoretical
   maximum rate for the frame size on the media.  RFC 2285 section 3.5.3
   specifies Maximum offered load (MOL) which may be lower than maximum
   frame rate.  There may be other limitations preventing high loads,
   for examples resources available to traffic generator.

   The manager is expected to provide a value that is not greater than
   any known limitation.  Alternatively, the measurer is expected to
   work at max load, possibly reporting as lost any frames that were not
   able to leave Traffic Generator.

   From the controller point of view, this is merely a global upper
   limit for any trial load candidates.

2.2.4.2.  Min load

   Min load is a specific quantity based on load.  No trial load is ever
   lower than this value.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 12]

Internet-Draft                  MLRsearch                      July 2023


   The motivation of this quantity is to prevent trials with too few
   frames sent to SUT.

   Also, practically if a SUT is able to reach only very small
   forwarding rates (min load indirectly serves as a threshold for how
   small), it may be considered faulty (or perhaps the test is
   misconfigured).

2.2.4.3.  Search goal

   A composite of 7 attributes (see subsections).

   If not otherwise specified, 'goal' always refers to a search goal in
   this document.

   The controller input may contain multiple search goals.  The name
   Multiple Loss Ratio search was created back when goal loss ratio was
   the only attribute allowed to vary between goals.

   Each goal will get its conditional throughput discovered and reported
   at the end of the search.

   The definitions of the 7 attributes are not very informative by
   themselves.  Their motivation (and naming) becomes more clear from
   the impact they have on conditional throughput.

2.2.4.3.1.  Goal loss ratio

   A specific quantity based on loss ratio.  A threshold value for trial
   loss ratios.  MUST be lower than one.

   Trial loss ratio values will be compared to this value, a trial will
   be considered bad if its loss ratio is higher than this.

   For example, RFC 2544 throughput has goal loss ratio of zero, a trial
   is bad once a sigle frame is lost.

   Loss ratio of one would classify each trial as good (regardless of
   loss), which is not useful.

2.2.4.3.2.  Goal initial trial duration

   A specific quantity based on duration.  A threshold value for trial
   durations.  MUST be positive.

   MLRsearch is allowed to use trials as short as this when focusing on
   this goal.  The conditional throughput may be influenced by shorter
   trials, (measured when focusing on other search goals).


Konstantynowicz & Polak  Expires 11 January 2024               [Page 13]

Internet-Draft                  MLRsearch                      July 2023


2.2.4.3.3.  Goal final trial duration

   A specific quantity based on duration.  A threshold value for trial
   durations.  MUST be no smaller than goal initial trial duration.

   MLRsearch is allowed to use trials as long as this when focusing on
   this goal.  If more data is needed, repeated trials at the same load
   and duration are requested by the controller.

2.2.4.3.4.  Goal min duration sum

   A specific quantity based on duration sum.  A threshold value for a
   particular duration sum.

   MLRsearch requires at least this amount of (effective) trials for a
   particular load to become part of MLRsearch outputs.

   It is possible (though maybe not prectical) for goal min duration sum
   to be smaller than goal final trial duration.

   In practice, the sum of durations actually spent on trial measurement
   can be smaller (when trial results are quite one-sided) or even
   larger (in presence of shorter-than-final trial duration results at
   the same load).

   If the sum of all (good and bad) long trials is at least this, and
   there are no short trials, then the load is guaranteed to be
   classified as either an upper or a lower bound.

   In some cases, the classification is known sooner, when the 'missing'
   trials cannot change the outcome.

   When short trials are present, the logic is more complicated.

2.2.4.3.5.  Goal exceed ratio

   A specific quantity based on exceed ratio.  A threshold value for
   particulat sets of trials.

   An attribute used for classifying loads into upper and lower bounds.

   If the duration sum of all (current duration) trials is at least min
   duration sum, and more than this percentage of the duration sum comes
   from bad trials, this load is an upper bound.

   If there are shorter duration trials, the logic is more complicated.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 14]

Internet-Draft                  MLRsearch                      July 2023


2.2.4.3.6.  Goal width

   A specific quantity based on width.  A threshold value for a
   particular width.  MUST be positive.

   This defines the exit condition for this search goal.

   Relevant bounds (of the final target) need to be this close before
   conditional throughput can be reported.

2.2.4.3.7.  Preceding targets

   A non-negative integer affecting the behavior of the controller.

   How many additional non-final targets to add.  Each next preceding
   target has double width and min duration sum geometrically closer to
   initial trial duration.

   The usage of preceding targets is an important source of MLRsearch
   time savings (compared to simpler search algorithms).

   Having this value configurable lets the manager tweak the overall
   search duration based on presumed knowledge of SUT performance
   stability.

2.2.5.  Controller internals

   Terms not directly corresponding to the controller's input nor
   output, but needed indirectly as dependencies of the conditional
   throughput definition.

   Following these definitions specifies virtually all of the controller
   (MLRsearch algorithm) logic.

2.2.5.1.  Pre-initial trials

   Up to three special trials executed at the start of the search.  The
   first trial load is max load, subsequent trial load are computed from
   preceding trial forwarding rate.

   The main loop of the controller logic needs at least one trial
   result, and time is saved if the trial results are close to future
   conditional throughput values.

   The exact way to compute load for second and third trial (and whether
   even measure second or third trial) are not specified here, as the
   implementation details have negligible effect on the reported
   conditional throughput.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 15]

Internet-Draft                  MLRsearch                      July 2023


2.2.5.2.  Search target

   A composite of 5 specific quantites (see subsections).  Frequently
   called just target.

   Similar to (but distinct from) the search goal.

   Each search goal prescribes a final target, probably with a chain of
   preceding targets.

   More details in the Derived targets section.

2.2.5.2.1.  Target loss ratio

   Same as loss ratio of the corresponding goal.

2.2.5.2.2.  Target exceed ratio

   Same as exceed ratio of the corresponding goal.

2.2.5.2.3.  Target width

   Similar to goal width attribute.  Doubled from goal width for each
   level of preceding target.

2.2.5.2.4.  Target min duration sum

   Similar to goal min duration sum attribute.  Geometrically
   interpolated between initial target duration and goal min duration
   sum.

2.2.5.2.5.  Target trial duration

   When MLRsearch focuses on this target, it measures trials with this
   duration.  The value is equal to the minimum of goal final trial
   duration and target min duration sum.

   Also, this value is used to classify trial results as short (if trial
   duration is shorter than this) or long.

2.2.5.3.  Derived targets

   After receiving the set of search goals, MLRsearch internally derives
   a set of search targets.

   The derived targets can be seen as forming a chain, from initial
   target to final target.  The chain is linked by a reference from a
   target to its preceding (towarsds initial) target.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 16]

Internet-Draft                  MLRsearch                      July 2023


   The reference may be implemented as 6th attribute od target.

2.2.5.3.1.  Final target

   The final target is the target where the most of attribute values are
   directly copied from the coresponding search goal.  Final target
   width is the same as goal width, final target trial duration is the
   same as goal final trial duration, and final target min duration sum
   is the same as the goal min duration sum.

   The conditional throughput is found when focusing on the final
   target.  All non-final targets do not directly affect the conditional
   throughput, they are there just as an optimization.

2.2.5.3.2.  Preceding target

   Each target may have a preceding target.  Goal attribute Preceding
   targets governs how many targets are created in addition to the final
   target corresponding to the search goal.

   Any preceding target has double width, meaning one balanced bisection
   is needed to reduce preceding target width to the next target width.

   Preceding target min duration sum is exponentially smaller, aiming
   for prescribed initial target min duration sum.

   Preceding target trial duration is either its min duration sum, or
   the corresponding goal's final trial duration, whichever is smaller.

   As the preceding min duration sum is shorter than the next duration
   sum, MLRsearch is able to achieve the preceding target width sooner
   (than with the next target min duration sum).

   This way an approximation of the conditional throughput is found,
   with the next target needing not as much time to improve the
   approximation (compared to not starting with the approximation).

2.2.5.3.3.  Initial target

   Initial target is a target without any other target preceding it.
   Initial target min duration sum is equal to the corresponding goal's
   initial trial duration.

   As a consequence, initial target trial duration is equal to its min
   duration sum.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 17]

Internet-Draft                  MLRsearch                      July 2023


2.2.5.4.  Trial classification

   Any trial result can be classified according to any target along two
   axes.

   The two classifications are independent.

   This classification is important for defining the conditional
   throughput.

2.2.5.4.1.  Short trial

   If the (measured) trial duration is shorter than the target trial
   duration, the trial is called long.

2.2.5.4.2.  Long trial

   If the (measured) trial duration is not shorter than the target trial
   duration, the trial is called long.

2.2.5.4.3.  Bad trial

   If the (measured) trial loss ratio is larger than the target loss
   ratio, the trial is called bad.

   For example, if the target loss ratio is zero, a trial is bad as soon
   as one frame was lost.

2.2.5.4.4.  Good trial

   If the (measured) trial loss ratio is not larger than the target loss
   ratio, the trial is called good.

   For example, if the target loss ratio is zero, a trial is good only
   when there were no frames lost.

2.2.5.5.  Load stat

   A composite of 8 quantities (see subsections) The quantites depend on
   a target and a load, and are computed from all trials measured at
   that load so far.

   The MLRsearch output is the conditional througput, which is a
   specific quantity based on load.  As MLRsearch may measure multiple
   trials at the same load, and those trials may not have the same
   duration, we need a way to classify a set of trial results at the
   same load.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 18]

Internet-Draft                  MLRsearch                      July 2023


   As the logic is not as straightforward as in other parts of MLRsearch
   algorithm, it is best defined using the following derived quantities.

   Load stat is the composite for one load and one target.  Set of load
   stats for one load an all targets is commonly called load stats.

2.2.5.5.1.  Long good duration sum

   Sum of durations of all long good trials (at this load, according to
   this target).

2.2.5.5.2.  Long bad duration sum

   Sum of durations of all long bad trials (at this load, according to
   this target).

2.2.5.5.3.  Short good duration sum

   Sum of durations of all short good trials (at this load, according to
   this target).

2.2.5.5.4.  Short bad duration sum

   Sum of durations of all short bad trials (at this load, according to
   this target).

2.2.5.5.5.  Effective bad duration sum

   One divided by tagret exceed ratio, that plus one.  Short good
   duration sum divided by that.  Short bad duration sum minus that, or
   zero if that would be negative.  Long bad duration sum plus that is
   the effective bad duration sum.

   Effective bad duration sum is the long bad duration sum plus some
   fraction of short bad duration sum.  The fraction is between zero and
   one (both possibly including).

   If there are no short good trials, effective bad duration sum becomes
   the duration sum of all bad trials (long or short).

   If an exceed ratio computed from short good duration sum and short
   bad duration sum is equal or smaller than the target exceed ratio,
   effective bad duration sum is equal to just long bad duration sum.

   Basically, short good trials can only lessen the impact of short bad
   trials, while short bad trials directly contribute (unless lessened).


Konstantynowicz & Polak  Expires 11 January 2024               [Page 19]

Internet-Draft                  MLRsearch                      July 2023


   A typical example of why a goal needs higher final trial duration
   than initial trial duration is when SUT is expected to have large
   buffers, so a trial may be too short to see frame losses due to a
   buffer becoming full.  So a short good trial does not give strong
   information.  On the other hand, short bad trial is a strong hint SUT
   would lose many frames at that load and long duration.  But if there
   is a mix of short bad and short good trials, MLRsearch should not
   cherry-pick only the short bad ones.

   The presented way of computing the effective bad duration sum aims to
   be a fair treatment of short good trials.

   If the target exceed ratio is zero, the given definition contains
   positive infinty as an intermediate value, but still simplifies to a
   finite result (long bad duration sum plus short bad duration sum).

2.2.5.5.6.  Missing duration sum

   The target min duration sum minus effective bad duration sum and
   minus long good duration sum, or zero if that would be negative.

   MLRsearch may need up to this duration sum of additional long trials
   before classifing the load.

2.2.5.5.7.  Optimistic exceed ratio

   The specific quantity based on exceed ratio, where bad duration sum
   is the effective bad duration sum, and good duration sum is the long
   good duration sum plus the missing duration sum.

   This is the value MLRsearch would compare to target exceed ratio
   assuming all of the missing duration sum ends up consisting of long
   good trials.

   If there was a bad long trial, optimistic exceed ratio becomes larger
   than zero.  Additionally, if the target exceed ratio is zero,
   optimistic exceed ratio becomes larger than zero even on one short
   bad trial.

2.2.5.5.8.  Pessimistic exceed ratio

   The specific quantity based on exceed ratio, where bad duration sum
   is the effective bad duration sum plus the missing duration sum, and
   good duration sum is the long good duration sum.

   This is the value MLRsearch would compare to target exceed ratio
   assuming all of the missing duration sum ends up consisting of bad
   good trials.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 20]

Internet-Draft                  MLRsearch                      July 2023


   Note that if the missing duration sum is zero, optimistic exceed
   ratio becomes equal to pessimistic exceed ratio.

   This is the role target min duration sum has, it guarantees the two
   load exceed ratios eventually become the same.  Otherwise,
   pessimistic exceed ratio is always bigger than the optimistic exceed
   ratio.

   Depending on trial results, the missing duration sum may not be large
   enough to change optimistic (or pessimistic) exceed ratio to move to
   the other side compared to target exceed ratio.  In that case,
   MLRsearch does not need to measure more trials at this load when
   focusing on this target.

2.2.5.6.  Target bounds

   With respect to a target, some loads may be classified as upper or
   lower bound, and some of the bounds are treated as relevant.

   The subsequent parts of MLRsearch rely only on relevant bounds,
   without the need to classify other loads.

2.2.5.6.1.  Upper bound

   A load is classified as an upper bound for a target, if and only if
   both optimistic exceed ratio and pessimstic load exceed ratio are
   larger than the target exceed ratio.

   During the search, it is possible there is no upper bound, for
   example because every measured load still has too high missing
   duration sum.

   If the target exceed ratio is zero, and the load has at least one bad
   trial (short or long), the load becomes an upper bound.

2.2.5.6.2.  Lower bound

   A load is classified as a lower bound for a target, if and only if
   both optimistic exceed ratio and pessimstic load exceed ratio are no
   larger than the target exceed ratio.

   During the search, it is possible there is no lower bound, for
   example because every measured load still has too high missing
   duration sum.

   If the target exceed ratio is zero, all trials at the load of a lower
   bound must be good trials (short or long).


Konstantynowicz & Polak  Expires 11 January 2024               [Page 21]

Internet-Draft                  MLRsearch                      July 2023


   Note that so far it is possible for a lower bound to be higher than
   an upper bound.

2.2.5.6.3.  Relevant upper bound

   For a target, a load is the relevant upper bound, if and only if it
   is an upper bound, and all other upper bounds are larger (as loads).

   In some cases, the max load when classified as a lower bound is also
   effectively treated as the relevant upper bound.  (In that case both
   relevant bounds are equal.)

   If that happens for a final target at the end of the search, the
   controller output may contain max load as the relevant upper bound
   (even if the goal exceed ratio was not exceeded), signalling SUT
   performs well even at max load.

   If the target exceed ratio is zero, the relevant upper bound is the
   smallest load where a bad trial (short or long) has been measured.

2.2.5.6.4.  Relevant lower bound

   For a target, a load is the relevant lower bound if two conditions
   hold.  Both optimistic exceed ratio and pessimstic load exceed ratio
   are no larger than the target exceed ratio, and there is no smaller
   load classified as an upper bound.

   This is a second place where MLRsearch is not symmetric (the first
   place was effective bad duration sum).

   While it is not likely for a MLRsearch to find a smaller upper bound
   and a larger load satisfying first condition for the lower bound, it
   still may happen and MLRsearch has to deal with it.  The second
   condition makes sure the relevant lower bound is smaller than the
   relevant upper bound.

   In some cases, the min load when classified as an upper bound is also
   effectively treated as the relevant lower bound.  (In that case both
   relevant bounds are equal.)

   If that happens for a final target at the end of the search, the
   controller output may contain min load as the relevant lower bound
   even if the exceed ratio was 'overstepped', signalizing the SUT does
   not even reach the minimal required performance.

   The manager has to make sure this is distingushed in report from
   cases where min rate is a legitimate conditional throughput (e.g. the
   exceed ratio was not overstepped at the min load).


Konstantynowicz & Polak  Expires 11 January 2024               [Page 22]

Internet-Draft                  MLRsearch                      July 2023


2.2.5.6.5.  Relevant bounds

   The pair of the relevant lower bound and the relevant upper bound.

   Useful for determining the width of the relevant bounds.  Any of the
   bounds may be the effective one (max load or min load).

   A goal is achieved (at the end of the search) when the final target's
   relevant bounds have width no larger than the goal width.

2.2.5.7.  Candidate selector

   A stateful object (a finite state machine) focusing on a single
   target, used to determine next trial input.

   Initialized for a pair of targets: the current target and its
   preceding target (if any).

   Private state (not shared with other selectors) consists of mode and
   flags.  Public state (shared with all selectors) is the actual
   relevant bounds for both targets (current and precedinig).

   After accepting a trial result, each selector can nominate one
   candidate (or no candidate) for the next trial measurement.

2.2.5.7.1.  Current target

   This is the target this selector tries to achieve.

2.2.5.7.2.  Preceding target

   The target (if any) preceding to the current target.

   While this selector does not focus on the preceding target, the
   relevant bounds for the preceding target are used as hints when the
   current bound does not have enough of its relevant bounds.

2.2.5.7.3.  Candidate

   The trial input (if any) this selecor nominates.

   The trial duration attribute is always the current target trial
   duration.  The trial load attribute depends on the selector state.

   Candidates have defined ordering, to simplify finding the winner.  If
   load differs, the candidate with lower load is preferred.  If load is
   the same but duration differs, the candidate with larger duration is
   preferred.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 23]

Internet-Draft                  MLRsearch                      July 2023


2.2.5.7.4.  Selector mode

   During its lifetime, selector proceeds through the following modes.
   In order, but some modes may be skipped or revisited.

   Each mode has its own strategy of determining the candidate load (if
   any).

2.2.5.7.4.1.  Waiting

   Not enough relevant bounds (even for the preceding target).  In this
   mode, the selector abstains from nominating a candidate.

   This selector leaves this mode when preceding target's selector is
   done.

2.2.5.7.4.2.  Halving

   Candidate is in the middle of the relevant bounds of the preceding
   target.

   If the relevant bounds are narrow enough already, this mode is
   skipped.  As the preceding target had double width, just one halving
   load needs to be measured.

   Selector uses a flag to avoid re-entering this mode once it finished
   measuring the halved load.

2.2.5.7.4.3.  Upgrading

   This mode activates when one relevant bound for the current target is
   present and there is a matching relevant bound of the preceding
   target within the current target width.  Candidate is the load of the
   matching bound from the preceding target.

   At most one bound load is measured, depending on halving outcome.
   Private flags are used to avoid upgrading at later times once
   selector finished measuring the upgraded load.

2.2.5.7.4.4.  Extending

   Refined already but the other relevant bound for the current target
   is still missing.  Nominate new candidate according to external
   search.  Initial target selectors skip all previous modes.

   A private value is used to track the width to be used in next load
   extension (increasing geometrically).  For initial target selectors,
   the starting width may be chosen based on pre-initial trial results.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 24]

Internet-Draft                  MLRsearch                      July 2023


   If both relevant bounds are present at the current load, but the
   lower bound is far away (compared to tracked width), the candidate
   from this mode is preferred (as long as the load is larger than the
   candidate load of bisecting mode).

2.2.5.7.4.5.  Bisecting

   Both relevant bounds for the current target are available, but they
   are too far from each other.  Candidate is in the middle.

   Contrary to halving, the candidate load does not need to be at the
   exact middle.  For example if the width of the current relevant
   bounds is three times as large as the target width, it is
   advantageous to split the interval in 1:2 ratio (choosing the lower
   candidate load), as it can save one bisect.

2.2.5.7.4.6.  Done

   Both relevant bounds for the current target are available, the width
   is no larger than the target width.  No candidate.

   If a selector reaches the done state, it is still possible later
   trials invalidate its relevant lower bound (by proving a lower load
   is in fact a new uper bound), making the selector transition into
   extending or bisecting mode.

2.2.5.7.5.  Active selector

   Derived from a common goal, the earliest selector which nominates a
   candidate is considered to be the active selector for this goal.
   Candidates from other selectors of the same goal are ignored.

   It is quite possible selectors focusing on other goals have already
   found a lower bound relevant to multiple targets in a chain.  In that
   case, we want the most-initial of the target selectors (not already
   in done mode) to have the nomination.

   Otherwise (when in extending mode and missun relevant upper bound)
   the closer-to-final selectors would nominate candidates at lower load
   but at too high duration sum, preventing some of the time savings.

2.2.5.7.6.  Winner

   If the candidate previously nominated by a selector was the one that
   got measured, the candidate is called a winner.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 25]

Internet-Draft                  MLRsearch                      July 2023


   A selector observing its previous candidate was a winer can use
   simplified logic when determining the mode, as it knows no other
   selectors may have changed the relevant loads unexpectedly.

2.2.6.  Controller output

   The output object the controller returns to the manager is a mapping
   assigning each search goal its conditional output (if it exists).

   The controller MAY include more information (if manager accepts it),
   for example load stat at relevant bounds.

   There MAY be several ways how to communicate the fact a conditional
   output does not exist (e.g. min load is classified as an upper
   bound).  The manager MUST NOT present min load as a conditional
   output in that case.

   If max load is a lower bound, it leads to a valid conditional output
   value.

2.2.6.1.  Conditional throughput

   The conditional throughput is the average of trial forwarding rates
   across long good trials measured at the (offered load classified as)
   relevant lower bound (for the goal, at the end of the search).  The
   average is the weighted arithmetic mean, weighted by trial duration.

   If the goal exceed ratio is zero, the definition of the relevant
   bounds simplifies significantly.  If additionally the goal loss ratio
   is zero, and the goal min duration sum is equal to goal final trial
   duration, conditional throughput becomes conditionally compliant with
   RFC 2544 throughput.  If the goal final trial duration is at least 60
   seconds, the conditional througput becomes unconditionally compliant
   with RFC 2544 throughput.

3.  Problems

3.1.  Long Test Duration

   Emergence of software DUTs, with frequent software updates and a
   number of different packet processing modes and configurations,
   drives the requirement of continuous test execution and bringing down
   the test execution time.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 26]

Internet-Draft                  MLRsearch                      July 2023


   In the context of characterising particular DUT's network
   performance, this calls for improving the time efficiency of
   throughput search.  A vanilla bisection (at 60sec trial duration for
   unconditional [RFC2544] compliance) is slow, because most trials
   spend time quite far from the eventual throughput.

   [RFC2544] does not specify any stopping condition for throughput
   search, so users can trade-off between search duration and achieved
   precision.  But, due to exponential behavior of bisection, small
   improvement in search duration needs relatively big sacrifice in the
   throughput precision.

3.2.  DUT within SUT

   [RFC2285] defines: - _DUT_ as - The network forwarding device to
   which stimulus is offered and response measured [RFC2285] (section
   3.1.1). - _SUT_ as - The collective set of network devices to which
   stimulus is offered as a single entity and response measured
   [RFC2285] (section 3.1.2).

   [RFC2544] specifies a test setup with an external tester stimulating
   the networking system, treating it either as a single DUT, or as a
   system of devices, an SUT.

   In case of software networking, the SUT consists of a software
   program processing packets (device of interest, the DUT), running on
   a server hardware and using operating system functions as
   appropriate, with server hardware resources shared across all
   programs and the operating system.

   DUT is effectively "nested" within SUT.

   Due to a shared multi-tenant nature of SUT, DUT is subject to
   interference (noise) coming from the operating system and any other
   software running on the same server.  Some sources of noise can be
   eliminated (e.g. by pinning DUT program threads to specific CPU cores
   and isolating those cores to avoid context switching).  But some
   noise remains after all such reasonable precautions are applied.
   This noise does negatively affect DUT's network performance.  We
   refer to it as an _SUT noise_.

   DUT can also exhibit fluctuating performance itself, e.g. while
   performing some "stop the world" internal stateful processing.  In
   many cases this may be an expected per-design behavior, as it would
   be observable even in a hypothetical scenario where all sources of
   SUT noise are eliminated.  Such behavior affects trial results in a
   way similar to SUT noise.  We use _noise_ as a shorthand covering
   both _DUT fluctuations_ and genuine SUT noise.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 27]

Internet-Draft                  MLRsearch                      July 2023


   A simple model of SUT performance consists of a baseline _noiseless
   performance_, and an additional noise.  The baseline is assumed to be
   constant (enough).  The noise varies in time, sometimes wildly.  The
   noise can sometimes be negligible, but frequently it lowers the
   observed SUT performance in a trial.

   In this model, SUT does not have a single performance value, it has a
   spectrum.  One end of the spectrum is the noiseless baseline, the
   other end is a _noiseful performance_. In practice, trial results
   close to the noiseful end of the spectrum happen only rarely.  The
   worse performance, the more rarely it is seen in a trial.

   Focusing on DUT, the benchmarking effort should aim at eliminating
   only the SUT noise from SUT measurement.  But that is not really
   possible, as there are no realistic enough models able to distinguish
   SUT noise from DUT fluctuations.

   However, assuming that a well-constructed SUT has the DUT as its
   performance bottleneck, the "DUT noiseless performance" can be
   defined as the noiseless end of SUT performance spectrum.  (At least
   for throughput.  For other quantities such as latency there will be
   an additive difference.)  By this definition, DUT noiseless
   performance also minimizes the impact of DUT fluctuations.

   In this document, we reduce the "DUT within SUT" problem to
   estimating the noiseless end of SUT performance spectrum from a
   limited number of trial results.

   Any improvements to throughput search algorithm, aimed for better
   dealing with software networking SUT and DUT setup, should employ
   strategies recognizing the presence of SUT noise, and allow discovery
   of (proxies for) DUT noiseless performance at different levels of
   sensitivity to SUT noise.

3.3.  Repeatability and Comparability

   [RFC2544] does not suggest to repeat throughput search.  And from
   just one throughput value, it cannot be determined how repeatable
   that value is.  In practice, poor repeatability is also the main
   cause of poor comparability, e.g. different benchmarking teams can
   test the same SUT but get different throughput values.

   [RFC2544] throughput requirements (60s trial, no tolerance to single
   frame loss) force the search to fluctuate close the noiseful end of
   SUT performance spectrum.  As that end is affected by rare trials of
   significantly low performance, the resulting throughput repeatability
   is poor.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 28]

Internet-Draft                  MLRsearch                      July 2023


   The repeatability problem is the problem of defining a search
   procedure which reports more stable results (even if they can no
   longer be called "throughput" in [RFC2544] sense).  According to
   baseline (noiseless) and noiseful model, better repeatability will be
   at the noiseless end of the spectrum.  Therefore, solutions to the
   "DUT within SUT" problem will help also with the repeatability
   problem.

   Conversely, any alteration to [RFC2544] throughput search that
   improves repeatability should be considered as less dependent on the
   SUT noise.

   An alternative option is to simply run a search multiple times, and
   report some statistics (e.g. average and standard deviation).  This
   can be used for "important" tests, but it makes the search duration
   problem even more pronounced.

3.4.  Throughput with Non-Zero Loss

   [RFC1242] (section 3.17) defines throughput as: The maximum rate at
   which none of the offered frames are dropped by the device.

   and then it says: Since even the loss of one frame in a data stream
   can cause significant delays while waiting for the higher level
   protocols to time out, it is useful to know the actual maximum data
   rate that the device can support.

   Contrary to that, many benchmarking teams settle with non-zero
   (small) loss ratio as the goal for a "throughput rate".

   Motivations are many: modern protocols tolerate frame loss better;
   trials nowadays send way more frames within the same duration; impact
   of rare noise bursts is smaller as the baseline performance can
   compensate somewhat by keeping the loss ratio below the goal; if SUT
   noise with "ideal DUT" is known, it can be set as the loss ratio
   goal.

   Regardless of validity of any and all similar motivations, support
   for non-zero loss goals makes any search algorithm more user-
   friendly.  [RFC2544] throughput is not friendly in this regard.

   Searching for multiple goal loss ratios also helps to describe the
   SUT performance better than a single goal result.  Repeated wide gap
   between zero and non-zero loss conditional throughputs indicates the
   noise has a large impact on the overall SUT performance.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 29]

Internet-Draft                  MLRsearch                      July 2023


   It is easy to modify the vanilla bisection to find a lower bound for
   intended load that satisfies a non-zero-loss goal, but it is not that
   obvious how to search for multiple goals at once, hence the support
   for multiple loss goals remains a problem.

3.5.  Inconsistent Trial Results

   While performing throughput search by executing a sequence of
   measurement trials, there is a risk of encountering inconsistencies
   between trial results.

   The plain bisection never encounters inconsistent trials.  But
   [RFC2544] hints about possibility if inconsistent trial results in
   two places.  The first place is section 24 where full trial durations
   are required, presumably because they can be inconsistent with
   results from shorter trial durations.  The second place is section
   26.3 where two successive zero-loss trials are recommended,
   presumably because after one zero-loss trial there can be subsequent
   inconsistent non-zero-loss trial.

   Examples include:

   *  a trial at the same load (same or different trial duration)
      results in a different packet loss ratio.

   *  a trial at higher load (same or different trial duration) results
      in a smaller packet loss ratio.

   Any robust throughput search algorithm needs to decide how to
   continue the search in presence of such inconsistencies.  Definitions
   of throughput in [RFC1242] and [RFC2544] are not specific enough to
   imply a unique way of handling such inconsistencies.

   Ideally, there will be a definition of a quantity which both
   generalizes throughput for non-zero-loss (and other possible
   repeatibility enhancements), while being precise enough to force a
   specific way to resolve trial inconsistencies.  But until such
   definition is agreed upon, the correct way to handle inconsistent
   trial results remains an open problem.

4.  How the problems are addressed

   Configurable loss ratio in MLRsearch search goals are there in direct
   support for non-zero-loss conditional throughput.  In practice the
   conditional throughput results' stability increases with higher loss
   ratio goals.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 30]

Internet-Draft                  MLRsearch                      July 2023


   Multiple trials with noise tolerance enhancement, as implemented in
   MLRsearch using non-zero goal exceed ratio value, also indirectly
   increases the result stability.  That allows MLRsearch to achieve all
   the benefits of Binary Search with Loss Verification, as recommended
   in [RFC9004] (section 6.2) and specified in [TST009] (section
   12.3.3).

   The main factor improving the overall search time is the introduction
   of preceding targets.  Less impactful time savings are achieved by
   pre-initial trials, halving mode and smart splitting in bisecting
   mode.

   In several places, MLRsearch is "conservative" when handling
   (potentially) inconsistent results.  This includes the requirement
   for the relevant lower bound to be smaller than any upper bound, the
   unequal handling of good and bad short trials, and preference to
   lower load when choosing the winner among candidates.

   While this does no guarantee good search stability (goals focusing on
   higher loads may still invalidate existing bounds simply by requiring
   larger min duration sums), it lowers the change of SUT having an area
   of poorer performance below the reported conditional througput loads.
   In any case, the definition of conditional throughput is precise
   enough to dictate "conservative" handling of trial inconsistencies.

5.  IANA Considerations

   No requests of IANA.

6.  Security Considerations

   Benchmarking activities as described in this memo are limited to
   technology characterization of a DUT/SUT using controlled stimuli in
   a laboratory environment, with dedicated address space and the
   constraints specified in the sections above.

   The benchmarking network topology will be an independent test setup
   and MUST NOT be connected to devices that may forward the test
   traffic into a production network or misroute traffic to the test
   management network.

   Further, benchmarking is performed on a "black-box" basis, relying
   solely on measurements observable external to the DUT/SUT.

   Special capabilities SHOULD NOT exist in the DUT/SUT specifically for
   benchmarking purposes.  Any implications for network security arising
   from the DUT/SUT SHOULD be identical in the lab and in production
   networks.


Konstantynowicz & Polak  Expires 11 January 2024               [Page 31]

Internet-Draft                  MLRsearch                      July 2023


7.  Acknowledgements

   Many thanks to Alec Hothan of OPNFV NFVbench project for thorough
   review and numerous useful comments and suggestions.

8.  References

8.1.  Normative References

   [RFC1242]  Bradner, S., "Benchmarking Terminology for Network
              Interconnection Devices", RFC 1242, DOI 10.17487/RFC1242,
              July 1991, <https://www.rfc-editor.org/info/rfc1242>.

   [RFC2285]  Mandeville, R., "Benchmarking Terminology for LAN
              Switching Devices", RFC 2285, DOI 10.17487/RFC2285,
              February 1998, <https://www.rfc-editor.org/info/rfc2285>.

   [RFC2544]  Bradner, S. and J. McQuaid, "Benchmarking Methodology for
              Network Interconnect Devices", RFC 2544,
              DOI 10.17487/RFC2544, March 1999,
              <https://www.rfc-editor.org/info/rfc2544>.

   [RFC9004]  Morton, A., "Updates for the Back-to-Back Frame Benchmark
              in RFC 2544", RFC 9004, DOI 10.17487/RFC9004, May 2021,
              <https://www.rfc-editor.org/info/rfc9004>.

8.2.  Informative References

   [FDio-CSIT-MLRsearch]
              "FD.io CSIT Test Methodology - MLRsearch", November 2022,
              <https://csit.fd.io/cdocs/methodology/measurements/
              data_plane_throughput/mlr_search/>.

   [PyPI-MLRsearch]
              "MLRsearch 0.4.0, Python Package Index", April 2021,
              <https://pypi.org/project/MLRsearch/0.4.0/>.

   [TST009]   "TST 009", n.d., <https://www.etsi.org/deliver/etsi_gs/
              NFV-TST/001_099/009/03.04.01_60/gs_NFV-
              TST009v030401p.pdf>.

Authors' Addresses

   Maciek Konstantynowicz
   Cisco Systems
   Email: mkonstan@cisco.com


Konstantynowicz & Polak  Expires 11 January 2024               [Page 32]

Internet-Draft                  MLRsearch                      July 2023


   Vratko Polak
   Cisco Systems
   Email: vrpolak@cisco.com


Konstantynowicz & Polak  Expires 11 January 2024               [Page 33]