INTERNET-DRAFT                                          A. Selcuk
draft-selcuk-probabilistic-lkh-01.txt                   C. McCubbin
Expires July 2000                                       D. Sidhu
                                                        UMBC MCTR
                                                        January 2000


Probabilistic Optimization of LKH-based Multicast Key Distribution Schemes
               <draft-selcuk-probabilistic-lkh-01.txt>

Status of this Memo

   This document is an Internet-Draft and is NOT offered in accordance
   with Section 10 of RFC2026, and the author does not provide the IETF
   with any rights other than to publish as an Internet-Draft.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet- Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft
   Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Abstract

   Currently, the most efficient techniques for multicast key management
   are based on the Logical Key Hierarchy (LKH) scheme. In current
   practice, LKH trees are organized as balanced binary trees, which
   give a uniform O(log n) complexity for compromise recovery in an n-
   member group. In this draft, we propose organizing the LKH trees with
   respect to the members' compromise probabilities instead of keeping a
   balanced tree, in a spirit similar to data compression techniques
   such as Huffman and Shannon-Fano coding. We describe two such
   probabilistic algorithms which improve the performance of an LKH
   scheme by probabilistic organization of the tree. Preliminary
   simulation results show that significant gains can be obtained by
   these techniques.

1. Introduction




Selcuk, McCubbin, Sidhu                                         [Page 1]





INTERNET-DRAFT                                              January 2000


   One of the biggest challenges in multicast security is to maintain a
   group key that is shared by all the group members and nobody else.
   Such a group key can be used to provide secrecy and integrity
   protection for the group communication.  The problem of maintaining
   the group key efficiently becomes a bigger challenge for larger
   groups where members join and leave very frequently.

   Currently, the most efficient methods for multicast key management
   are based on the Logical Key Hierarchy (LKH) scheme of Wallner et al.
   [WHA97]. In LKH, group members are organized as leaves of a tree with
   logical internal nodes. The cost of a compromise recovery operation
   in LKH is proportional to the depth of the compromised member in the
   LKH tree. The original LKH scheme proposes maintaining a balanced
   tree, which gives a uniform cost of O(log n) rekeys for compromise
   recovery in an n-member group.

   Performance of an LKH scheme can be improved by organizing the tree
   with respect to compromise probabilities of members instead of
   keeping a balanced tree, in a spirit similar to data compression
   algorithms such as Huffman and Shannon-Fano coding. In this draft, we
   describe two such probabilistic algorithms which improve the
   performance of an LKH scheme by organizing the tree with respect to
   member compromise probabilities. In Section 2, we describe the basic
   LKH scheme. In Section 3, we discuss the problem of probabilistic LKH
   optimization and its challenges.  In Section 4, we describe the
   design criteria for our algorithms. In Section 5, we describe the
   algorithms. In Section 6, we discuss the problem of assigning
   probabilities (or, "weights") to members in the group, and propose a
   heuristic method for weight assignment. In Section 7, we summarize
   some computer simulation results regarding the performance of the
   algorithms.

   Regarding the notation in the paper, log(...) denotes logarithm in
   base 2, sum(...) denotes summation for i from 1 to n (i.e. all
   summations in this draft are over the variable "i", for its values
   from 1 to "n").

2. The LKH Scheme

   In an LKH tree, there is a leaf node corresponding to each group
   member. The internal nodes are "logical" entities which do not
   correspond to any real-life entities in the multicast group, and are
   only used for key distribution purposes. There is a key associated
   with each node in the tree, and each member holds a copy of every key
   on the path from its corresponding leaf node to the root of the tree.
   Hence, the key corresponding to the root node is shared by all
   members, and serves as the group key. An instance of an LKH tree is
   shown in Figure 1.



Selcuk, McCubbin, Sidhu                                         [Page 2]





INTERNET-DRAFT                                              January 2000


           Figure 1: An example LKH tree.


                                 K_r
                    ______________|________________
                   |                               |
                   |                               |
                  K_0                             K_1
            _______|_______                 _______|_______
           |               |               |               |
           |               |               |               |
          K_00            K_01            K_10            K_11
        ___|___         ___|___         ___|___         ___|___
       |       |       |       |       |       |       |       |
       |       |       |       |       |       |       |       |
     K_000   K_001   K_010   K_011   K_100   K_101   K_110   K_111
      M_1     M_2     M_3     M_4     M_5     M_6     M_7     M_8

   In this figure, member M_1 holds a copy of the keys K_000, K_00, K_0,
   and K_r; member M_2 holds a copy of K_001, K_00, K_0, and K_r; and so
   on.  In case of a compromise, the compromised keys are changed, and
   the new keys are multicast to the group encrypted by their children
   keys. For example, assume the keys of M_2 are compromised. First
   K_001 is changed and sent to M_2 over a secure unicast channel. Then
   K_00 is changed, two copies of the new key are encrypted by K_000 and
   K_001, and sent to the group. Then K_0 is changed and sent to the
   group, encrypted by K_00 and K_01; and finally K_r is changed and
   sent to the group, encrypted by K_0 and K_1.  From each encrypted
   message, the new keys are extracted by the group members who have a
   valid copy of one of the (child) encryption keys.

   If the security policy requires backward and forward secrecy for
   group communication (i.e. a new member should not be able to decrypt
   the communication that took place before its joining, and a former
   member should not be able to decrypt the communication that takes
   place after its leaving) then the keys on the leaving/joining
   member's path in the tree should be changed in a way similar to that
   described above for compromise recovery.

   Although the tree in Figure 1 is a binary tree, an LKH scheme can be
   implemented with a tree of an arbitrary degree. Nevertheless, the
   most efficient and practical implementations are obtained by a binary
   tree. In a binary LKH tree implementation, the tree is always full
   (i.e., every non-leaf node has exactly two children) and any node
   with a single child (e.g., parent of a deleted leaf node) is always
   removed and its parent and child are linked to each other. In this
   draft, we follow the convention and focus our attention on binary
   trees. We assume the deletion operation which removes a leaf node



Selcuk, McCubbin, Sidhu                                         [Page 3]





INTERNET-DRAFT                                              January 2000


   also removes any node left with a single child, and therefore the
   tree is always kept full.


3. Probabilistic LKH Optimization

   A rekey operation in an LKH scheme can be caused by a join, leave, or
   compromise event. A rekey operation in case of a join can be done
   with a single multicast message [BMS99].  For a leave or compromise
   event, the number of rekeys will be proportional to d_i, the depth of
   the evicted/compromised member in the LKH tree. More specifically,
   the number of rekeys is linear in d_i, as c1*d_i + c0. The values of
   the coefficients c0 and c1 depend on the specifics of the LKH
   implementation. For our purposes, the exact values of these
   coefficients do not matter, and our results are applicable for any
   values of them, given c1 > 0.

   The original LKH scheme uses a balanced tree for key distribution,
   where the depth of each member is log(n) in an n-member group. Hence,
   the cost of each rekey operation is proportional to log(n).  Instead
   of keeping a balanced tree regardless of the membership dynamics,
   performance of an LKH scheme can be improved by organizing the tree
   with respect to the rekeying likelihood of members. That is, the
   average rekey cost can be reduced by putting the more likely-to-rekey
   members closer to the root, and moving less likely members further
   down the tree.

   The problem we address in this draft is how to reduce the average
   rekey cost of an LKH scheme by organizing the members in the tree
   with respect to their rekey probabilities. In general, finding the
   optimal solution to this problem which minimizes average rekey cost
   is quite an intractable task. The optimal solution would be the
   choice of the sequence of operations that would minimize the cost
   averaged over all possible join, leave, and compromise events in the
   future, which depends on the probability distribution of these events
   for all current and prospective members of the group.  Even if we
   assume that these probability distributions are known for all members
   and non-members in the system, finding the optimal move at each step
   that will minimize the average cost of all future rekey events is
   quite an impossible task.

   Instead, we restrict our attention to a more tractable problem,
   namely minimizing the expected cost of the next rekey operation. The
   expected cost of the next rekey operation (due to a leave or
   compromise event) is equal to

           sum(p_i * d_i)




Selcuk, McCubbin, Sidhu                                         [Page 4]





INTERNET-DRAFT                                              January 2000


   where p_i is the probability that member M_i will be the next to be
   evicted/compromised, and d_i is its depth in the tree.  This problem
   has many similarities to the data compression problem with code
   trees, where the average code length per message is

           sum(p_i * d_i),

   which is known as the "average external path length" (AEPL) of the
   tree, where p_i is the probability of message m_i to be the next to
   appear, and d_i is its depth in the code tree. For the problem of
   minimizing the AEPL, the optimal solution is given by Huffman trees
   [BCW90].  Shannon-Fano trees are another alternative solution, which
   give very good compression in practice, but are slightly sub-optimal
   [BCW90].

3.1. Differences from Data Compression Trees

   In an LKH key distribution tree, a change in the tree structure, such
   as changing the location of an existing member in the tree, causes
   extra rekey operations, which adversely affects the objective
   function (i.e. minimizing the average number of rekeys). On the other
   hand, in a data compression tree, a structural change does not
   directly induce an overhead on the objective function (i.e.
   minimizing the average code length). So, changes in the tree
   structure can be freely utilized in data compression algorithms, such
   as dynamic Huffman algorithms [Knuth85], to maintain the optimal tree
   structure; whereas they cannot be utilized so freely in dynamic LKH
   algorithms. Therefore, an LKH scheme with sub-optimal sum(p_i * d_i)
   can have a better overall performance than one that maintains the
   minimal sum(p_i * d_i) all the time.

   Another difference of LKH trees from data compression trees is that,
   when member evictions are the main reason for rekey operations (i.e.
   few compromise events happen other than member evictions), then each
   member in the tree will cause a single rekey operation while it is in
   the tree.

4. Design Criteria

   As mentioned above, finding the optimal solution that minimizes the
   average number of rekey messages over all future sequence of join,
   leave, compromise events is not possible in practice. Therefore, we
   focus our attention on minimizing the expected cost of the next rekey
   event, sum(p_i * d_i). The proven optimal solution for minimizing
   sum(p_i * d_i) is given by a Huffman tree. However, maintaining a
   Huffman tree requires changes in the locations of the existing
   members in the tree, which means extra rekey operations. We choose to
   avoid this kind of extra rekey operations, and we restrict our



Selcuk, McCubbin, Sidhu                                         [Page 5]





INTERNET-DRAFT                                              January 2000


   attention to algorithms which do not require changing the location of
   the existing members.

   Given the condition that the location of existing members will not be
   changed in the tree, the main structural decision for the tree
   organization is where to put a new member at insertion time. Also,
   the insertion operation should observe the current locations of
   existing members. That is, the keys each member is holding after an
   insertion operation should be the same as those it was holding before
   the insertion (or the corresponding keys, for the keys that are
   changed), plus possibly some newly added keys to the tree. Therefore,
   we will focus our attention on insertion operations of the following
   form, which is implied by the conditions above.

           Figure 2: An insertion operation which keeps the relative
           positions of the existing members the same.

             Root                                   Root
              .                                      .
             .                                      .
              .                Put(M,X)              .
               Y               -------->             Y
                \                                     \
                 X                                     N
                . .                                   / \
               .   .                                 M   X
                                                        . .
                                                       .   .


   That is, to insert a new member M into the group, a new internal node
   N is inserted at somewhere in the tree, and M is linked underneath.
   To denote this insertion operation at a given location X for a given
   new member M, we will write Put(M, X).  Note that the traditional LKH
   insertion, where every new member is inserted as a sibling to a leaf
   node, is a specific case of Put(M, X) where X is a leaf node.

   In our probabilistic LKH trees, each node X in the tree has a
   probability field X.p which is the cumulative probability of the
   members in the subtree rooted at X (similar to as it is in Huffman
   trees), i.e., X.p is equal to the probability of the corresponding
   member if X is a leaf node, and it is equal to X.left.p + X.right.p
   if X is an internal node. The "Put" procedure shown above should also
   update the "p" field of all nodes affected by the insertion, as well
   as setting up the appropriate links for M and N.

5. Insertion Algorithms




Selcuk, McCubbin, Sidhu                                         [Page 6]





INTERNET-DRAFT                                              January 2000


   In this section, we describe two LKH insertion algorithms which seek
   to minimize the expected number of rekeys for the next member
   eviction or compromise. The first algorithm does not induce any
   additional computational cost over the basic balanced-tree LKH
   insertion. The second algorithm provides further improvement over the
   first algorithm in message complexity, but induces an O(n)
   computational overhead for an n-member group.


   Algorithm 1:

   The first algorithm, which we refer to as "Insert1", organizes the
   LKH tree in a way which imitates the Shannon-Fano data compression
   trees. In Shannon-Fano coding [BCW90], a tree is constructed from a
   given set of probabilities by dividing the set into two parts of
   (roughly) equal probability repeatedly until every set includes a
   single element. Shannon-Fano coding guarantees a maximum redundancy
   of 1; i.e.  sum(p_i * d_i)  <=  sum(p_i * -log(p_i)) + 1, for
   sum(p_i) = 1.  Even though finding the best partition is NP-hard,
   there are many partition heuristics that maintain the redundancy
   bound of 1.

   The principle of Insert1 is to insert a new node in a way which
   obtains the best partitioning at every level so that the resulting
   tree will have an AEPL close to the optimal bound of sum(p_i *
   -log(p_i)). The algorithm works as follows:

   Insert1(M, X):
   --------------

   if ( (M.p >= X.left.p) and (M.p >= X.right.p) )
       Put(M, X);
   else if (X.left.p >= X.right.p)
       Insert1(M, X.right);
   else
       Insert1(M, X.left);


   To insert M in a tree with root node "Root", the procedure is called
   as "Insert1(M, Root)".


   Algorithm 2:

   The second algorithm, which we refer to as "Insert2", finds the best
   insertion point for member M by searching all possible insertion
   points in the tree. The amount of increase in AEPL that will be
   caused by Put(M, X) for node X at depth d is equal to



Selcuk, McCubbin, Sidhu                                         [Page 7]





INTERNET-DRAFT                                              January 2000


           (M.p * d) + X.p .

   Insert2 searches the whole tree to find the location which minimizes
   this quantity.

   Insert2(M, Root):
   -----------------

   For every node X in the subtree under node Root, compute
       Cost[X] = (M.p * d) + X.p;
   Put(M, MIN_X), where MIN_X is the X that minimizes Cost[X];

   Computational performance of Insert2 can be improved by taking
   shortcuts in finding MIN_X. For example, when X.p <= M.p the subtree
   under X need not be searched. More sophisticated shortcuts which
   improve the performance further are also possible. In the worst case,
   Cost[X] should be computed for all nodes in the tree. Nevertheless,
   the formula for Cost[X] is quite simple, and can be computed quite
   efficiently. So, when the computation power is plentiful compared to
   the bandwidth, Insert2 can be the method of choice which obtains
   improved reduction in number of rekeys at the expense of
   computational cost.

6. Assignment of Probabilities


   An important issue that should be resolved before utilizing the
   algorithms described above is how to assign probabilities to members.

   First, it should be noted that the tree techniques we mentioned
   (Insert1, Insert2, Huffman, Shannon-Fano) work just the same whether
   sum(p_i) = 1  or not. So, we can use any non-negative weights w_i
   (which do not necessarily add up to 1) instead of probabilities p_i.
   For some other purposes where actual probabilities are needed, such
   as entropy and redundancy calculations, the probabilities can be
   computed as

           p_i = w_i / W

   where W = sum(w_i).

   As we discussed in the problem definition in Section 3, the quantity
   we seek to minimize is the average cost of the next rekey operation,
   sum(p_i * d_i). In this summation, p_i is the probability that member
   M_i will be the next to rekey (by leaving or by compromise).
   Computing the exact value of p_i depends on the probability
   distribution of all the members' inter-rekey time, and is extremely
   difficult in general. As a heuristic, we propose using



Selcuk, McCubbin, Sidhu                                         [Page 8]





INTERNET-DRAFT                                              January 2000


           w_i = 1 / t_i

   for the weight assignments, where t_i is the expected inter-rekey
   time for member M_i. There are two reasons for our choice of 1/t_i as
   the weight measure among infinitely many other candidates:

     1. Its simplicity and convenience
     2. In the special case where the members' inter-rekey times are
        exponentially-distributed, p_i = w_i/W  gives exactly the probability
        that M_i will be the next to rekey.

   Estimation of the expected inter-rekey times of the members is
   outside the scope of this draft. Average of a member's past inter-
   rekey times would be a good estimator if members have caused
   sufficiently many rekey events in the group. If not, history of the
   members from other similar groups can be used if this kind of
   information is available.

7. Experimental Comparison of the Methods

   We compared Insert1, Insert2, and basic balanced-tree LKH method with
   computer simulations. In these simulations, various distributions
   (normal, exponential, uniform) are assumed for the distribution of
   rekey time of a member, and also for the distribution of the expected
   rekey time t_i of different members. Inter-arrival time of new
   members is always assumed to be distributed exponentially. For the
   group sizes, we experimented with various values up to 100,000.

   The simulations showed that significant gains can be obtained by
   probabilistic organization of the LKH tree. In our experiments,
   Insert1 performed 5-40% better than the balanced LKH in terms of the
   average number of rekeys needed for compromise recovery.  Insert2
   provided a further 2-10% improvement over Insert1. More typical
   values were 15-20% for Insert1, and 15-25% for Insert2 as the
   improvement rate over the basic balanced scheme. A more complete
   presentation of the experimental results will be given in future
   versions of this draft.

   Of course, it is possible to obtain arbitrarily high improvement
   figures for the algorithms by increasing the variance in the
   distribution of t_i values.  The results we report here are for more
   "realistic" choices of the distribution parameters. To test how these
   probabilistic schemes will behave for real-life multicast groups,
   empirical data on real-life member behaviors should be obtained. For
   now, we conjecture that a 15-20% improvement rate over the basic
   balanced LKH scheme would be realistic for fairly large groups (e.g.
   for groups with more than 1000 members). The improvement rate should
   be higher for larger and more diverse groups.



Selcuk, McCubbin, Sidhu                                         [Page 9]





INTERNET-DRAFT                                              January 2000


8. Conclusions

   We showed two methods which improve the rekey message complexity cost
   of the LKH method for multicast key management by probabilistic
   organization of the LKH tree. Besides the basic LKH scheme of Wallner
   et al. [WHA97], these probabilistic methods are also applicable to
   more sophisticated LKH-based techniques such as the Balenson-McGrew-
   Sherman scheme [BMS99] and the scheme of Canetti et al. [CGIMNP99].


Acknowledgments

   We would like to thank Eric Harder for many helpful suggestions and
   informative discussions.


References

   [BCW90] T. C. Bell, J. G. Cleary and I. H. Witten, Text Compression,
   Prentice-Hall, 1990.

   [BMS99]  D. Balenson, D. McGrew, and A. Sherman, Key Management for
   Large Dynamic Groups: One-Way Function Trees and Amortized
   Initialization, Internet draft draft-balenson-groupkeymgmt-oft-
   01.txt, July 1999.

   [CGIMNP99]  R. Canetti, J. Garay, G. Itkis, D. Micciancio, M. Naor,
   and B.  Pinkas, Multicast Security: A Taxonomy and Some Efficient
   Constructions, Infocomm'99 conference, 1999.

   [Knuth85]  D. E. Knuth, Dynamic Huffman Coding, Journal of
   Algorithms, volume 6, pages 163-180, 1985.

   [WHA97]  D. Wallner, E. Harder, and R. Agee, Key Management for
   Multicast:  Issues and Architectures, Internet draft draft-wallner-
   key-arch-00.txt, July 1997.


Authors' Addresses:

   Ali A. Selcuk, Christopher B. McCubbin, Deepinder Sidhu
   Maryland Center for Telecommunications Research
   University of Maryland Baltimore County
   1000 Hilltop Circle
   Baltimore, MD 21250, USA
   Email: {aselcu1, cmccub1, sidhu}@umbc.edu
   Expires June 2000




Selcuk, McCubbin, Sidhu                                        [Page 10]