Internet DRAFT - draft-kilsdonk-redundant-tcp

draft-kilsdonk-redundant-tcp







INTERNET-DRAFT expires  6/1/2006
draft-kilsdonk-redundant-tcp-01.txt
Network Working Group                                        D. Kilsdonk
Internet-Draft:                                      
                                                             June    13, 2005

              Redundant TCP


By submitting this Internet-Draft, each author represents that any applicable patent or 
other IPR claims of which he or she is aware have been or will be disclosed, and any of 
which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task
Force (IETF), its areas, and its working groups. Note that other groups
may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and
may be updated, replaced, or obsoleted by other documents at any time. It
is inappropriate to use Internet-Drafts as reference material or to cite
them other than as "work in progress."

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/1id-abstracts.html
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html

Abstract

   This memo documents a process which can be used to avoid network 
outages during the upgrading of a router's software.  In this way, 
disruptions to the network community are avoided.  The upgrade process 
definition comprises requirements for inserting a replacement router 
or upgrading the software in a working router within an operational 
network.  To do this, it maximizes the use of accepted internet 
standards with minimal alteration.



                           [0]

RFC DONK      Redundant TCP         June 2005



 Table of Contents

 1.0 INTRODUCTION...................................................2
 2.0 MINIMUM REQUIREMENTS...........................................2
 2.1 REDUNDANT CONFIGURATION........................................3
 2.2 AWARENESS OF REDUNDANT CONFIGURATION...........................3
 2.3 AWARENESS OF STATE OF REDUNDANT CONFIGURATION..................3
 3.0 REDUNDANT BOOTUP... ...........................................4
 4.0 TIME SYNCHRONIZED CASE.........................................4
 4.1 REDUNDANT OPERATIONS...................... ....................4
 4.1.1 OPERATOR INITIATED CONFIGURATION CHANGES.....................4
 4.1.2 BACKPLANE INITIATED CONFIGURATION CHANGES....................4
 4.1.2.1 OPERATIONAL STATION DISTINCT PROCESSING....................4
 4.1.2.2 STANDBY STATION DISTINCT PROCESSING........................5
 4.1.2.2.1 STANDBY MAINTENANCE OF TCP REDUNDANCY....................5
 5.0 UNSYNCHRONIZED CASE............................................6
 5.1 STANDBY DATA SYNCHRONIZATION...................................6
 5.2 STANDBY TIMING SYNCHRONIZATION ................................7
 6.0 ALTERNATIVES...................................................7
 7.0 CRYPTOGRAPHY...................................................7
 8.0 REFERENCES.....................................................7
 9.0 OTHER BENEFITS.................................................8
 9.0  DEFINITIONS OF TERMS..........................................8
10.0 AUTHOR'S ADDRESS...............................................8
APPENDIX A: GLOSSARY OF ACRONYMS....................................9




RFC DONK      Redundant TCP         June 2005
                         [1]

             
RFC DONK      Redundant TCP         June 2005



1.  INTRODUCTION

   Disruption in the network community arises whenever a router or 
endstation crashes or is brought down for maintenance-usually to 
perform a software upgrade.  

   In many cases, this disruption proliferates through the network 
causing other disturbances.   To minimize these disruptions, a practical 
operational framework is presented by using a generally-accepted model 
of a simple, generic router and maximal use of existing protocol 
specifications with minimal change.

2. MINIMUM REQUIREMENTS

  This approach uses redundancy to support on-going operations while 
critical equipment is upgraded or replaced.   To facilitate this, 
minimum functionality must be present beyond having two like-designed 
pieces of equipment.   To the extent these requirements are met, the 
switchover will be seamless.   If not, the switchover disruption will 
be minimized to the degree these requirements have been facilitated.


                        [2]

RFC DONK      Redundant TCP         June 2005

2.1 REDUNDANT CONFIGURATION

  Of course, to support a switchover and redundancy in a meaningful 
sense, the two pieces of equipment must share physical connectivity 
with all neighbors and must have the same IP and machine addresses for 
all general traffic ports.   This is very much the definition of 
redundant configurations.    Each station also has a distinct management 
IP address of which the other is made aware (in-band management). One 
station is designated "operational," and its redundant counterpart 
henceforth, is in "standby" mode. (Where redundant addressing cannot be 
achieved at the machine layer, the standby station must be put in 
promiscuous mode and after switchover, the standby must transmit an 
unsolicited ARP response correlating the extant IP address with the 
new machine address for the benefit of all listening peers.)

  The degree to which both stations exhibit truly redundant operations 
can be gauged by how well they share current information.  This approach 
exploits all information on the network in lieu of shared memory to 
facilitate the same.

2.2 AWARENESS OF REDUNDANT CONFIGURATION

  The operational and standby equipment must support two indicators.   
One indicates that a redundant configuration exists (ie, a redundant 
counterpart is present), while another indicates that the station is 
functioning properly.

   For example, support for indicating that a backup station exists, may 
be done through the human interface whether that be a command issued at 
the command-line-interface (CLI) or through an SNMP setting or more 
simply, a bit set to "one" by the station (I_AM_HERE). In this case, the 
bit must be readable by the station's redundant counterpart.

2.3 AWARENESS OF STATE OF REDUNDANT COUNTERPART

  A generic router typically has an operational-signal (LED) which 
indicates that its software is actively performing the routing function.  
The router's software turns this signal off during exception handling 
such as a software crash thereby indicating a software malfunction 
(possibly made manifest due to a hardware fault).  An operator-initiated 
deactivation also resets this operational-signal.   This indicator must 
be readable by the station's redundant counterpart.




                             [3]
RFC DONK      Redundant TCP         June 2005



3. REDUNDANT BOOTUP

   As each station boots, and if it detects the presence of a redundant 
counterpart, it immediately signals that it is operationally unready.   
It then waits until it detects the same signal from its counterpart.   
If so, it then signals readiness and awaits the detection of the same 
from its counterpart.  It does so until it detects readiness or a 
timeout value expires.   This timeout value may be as great as  the 
default TCP timeout for the station but for practical considerations, it 
should be set to two minutes.   If this step times out, the station is 
stated as "standby" and "unsynchronized" but may otherwise proceed in 
its bootup.

    If this step does not time-out, each station is time-synchronized 
with the other and may proceed in its bootup in much the same way it
would without redundancy concerns.   The station with the lower 
management IP address is stated as "operational".   The station with the 
higher management IP address is stated as "standby."

4. TIME SYNCHRONIZED CASE

4.1 REDUNDANT OPERATIONS

   Operations at the station generally consist of two parts.   The first 
is operator initiated configuration changes such as adding a static 
route.   It is imperative that the redundancy configuration has this 
knowledge instantiated at both stations at the same time to maintain 
pure redundancy.   The second part comprises configuration changes such 
as routing table updates which are made manifest from external sources 
(peer routers) connected on the general switching ports (backplane 
redundancy). 

4.1.1 OPERATOR INITIATED CONFIGURATION CHANGES

   Configuration changes and commands issued at the station must be 
conveyed to the other station.   This is simply done for a command line 
interface by establishing a telnet session to the redundant 
counterpart's management IP address and echoing all commands as they are 
introduced.    The responses to these commands may be summarily ignored.

   Configuration changes and commands issued directly to the station via 
SNMP may be copied and sent to the redundant counterpart by copying the 
SNMP packet upon its reception by the station and sending this copy to 
its redundant counterpart (via its distinct management IP address).

4.1.2 BACKPLANE INITIATED CONFIGURATION CHANGES

   The processing for handling backplane initiated configuration changes 
is predicated on the operational-signal of the redundancy counterpart.  
Each time a packet is sent, this bit is read to determine if the 
counterpart is functioning as the operational station.   If this 
station's redundancy counterpart is not the "standby" station, it may be 
safely assumed that this station is the "standby".    

                    [4]

RFC DONK      Redundant TCP         June 2005



4.1.2.1  OPERATIONAL-STATION DISTINCT PROCESSING

    The operational station operates in the same manner as in the 
non-redundant configuration case.  In this way, the operational 
software's only modification is to detect, with each packet to be 
transmitted, whether or not it is in the operational mode.  Its 
software is minimally altered and therefore its inherent reliability is 
not diminished as a result of introducing redundancy functionality.

4.1.2.2 STANDBY-STATION DISTINCT PROCESSING

    The standby station executes the same software as the operational 
station.   However, in layer-one of network handling, it does not 
transmit its packet but does return a satisfactory status that it has 
transmitted the packet.    The net effect is the same, inasmuch as the 
operational station DID send the packet.  They may now await an 
acknowledge (in TCP parlance).

4.1.2.2.1  STANDBY MAINTENANCE OF TCP REDUNDANCY 

    The station may be seen as an Input/Output (I/O) engine which is 
only based on packet timing mark.  We may then work at that timing 
granularity.   Initial timing synchronization was achieved in paragraph 
3.  Packets sent from backplane stations will tend to re-calibrate the 
two stations to the same packet timing.  Finally, the TCP timeout window 
is usually very large which also confirms our ability to keep the 
stations configuration redundant to a very high level of confidence. 

    Since the timing granularity is only held to within one packet time, 
it is possible that an acknowledge can be seen at the standby before the 
sequence-to-be-acknowledged, has been "sent."   Therefore, it is 
required that the TCP-input algorithm (but not its interface) be 
modified so that if this syndrome is detected, the acknowledge packet is 
buffered and recycled into the input stream after TCP outputs the 
sequence for which the acknowledge is hinged.   

   Now, should the operational station become defective, its exception 
handler will mark it as "standby".   Since its redundant counterpart 
checks this indicator every time it sends a backplane packet, switchover 
of operations to what was the standby, can proceed within one packet 
time.   All operations will continue since protocols such as BGP, OSPF, 
and others run on TCP.

   The crashed or operator-terminated station may now have its software 
upgraded.    When it reboots, it will try the synchronization steps of 
paragraph 3 and furthermore, it will timeout (since the other station is 
continuing on in operational-mode) causing the newly introduced station 
to be stated as "unsynchronized."  As an unsynchronized-standby 
processor, its transmissions are never allowed onto the backplane 
connections.
 
                         [5]

RFC DONK      Redundant TCP         June 2005


5 UNSYNCHRONIZED CASE

   In the unsynchronized case, the station has failed to perform the 
steps of paragraph 3.   It can be assumed that the failure was because 
the redundancy counterpart of this station was already operational and 
therefore it did not collaborate in the time synchronization and 
testing (toggling) of its operational-indicator.   This is the case of 
a station joining the redundancy configuration late and is therefore 
regarded as a "late-joiner" case.

    The same requirements apply for the late-joiner.   The operational 
station can detect the presence of the late-joiner through the means of 
paragraph 2.3.  What remains is for the standby station to achieve data 
and timing synchronization.    These steps may be processor intensive 
and should therefore be tied to a scheduler for a time in which 
critical network operations are expected to be at a minimum.   In this 
way, during the steps of time and data synchronization, configuration 
changes are also minimized.

5.1 STANDBY DATA SYNCHRONIZATION

   The standby continues its bootup and then issues an SNMP MIB WALK (GET) 
to the operational station's management IP address of object number one.    
This will return all objects comprising the  operational station's MIB 
definition.   On reception, the standby must convert the GET_RESULT to a 
SET command and issue it to itself.

   Typically, routers do not issue SNMP-GETs to other routers.  There is 
nothing in the SNMP RFC to preclude this however.   The operational 
station may assume that an SNMP-MIB WALK ALL from the standby station 
indicates that a late joiner synchronization is in-process.

   Also, some SNMP MIB objects are read-only.   This prevents operators 
from setting data that reflect non-operator-controlled-configuration-
items such as the system-uptime of the router.    While this is a noble 
end in itself, the standby may trust that the operational station 
reflects this data dutifully.  Therefore the data at the operational 
station has the same integrity as if the standby had gotten it directly 
from its source.    Therefore, during data synchronization by the 
standby, SNMP MIB objects that are read-only may be written in the 
standby.   This is a manageable, very specific, software change.  The 
result for system-uptime, for example, is that the uptime reflects the 
uptime of the system as opposed to that of the individual station.

   Finally, files used to contain configuration information may now be 
obtained by the standby station by issuing appropriate FTP commands 
targeted to the primary's management IP address. All that remains for 
complete redundancy at this point is achieving timing synchronization 
of the backplane.

                           [6]

RFC DONK      Redundant TCP         June 2005

5.2 STANDBY TIMING SYNCHRONIZATION
 
   Once data synchronization is achieved, the two stations re-establish 
telnet connectivity for in-band management redundancy.   The first 
command sent initiates timing synchronization.   In this, both stations 
state that timing synchronization does not exist.   Therefore, both are 
considered operationally unready.    Since neither is operational during 
timing resynchronization, neither transmits packets to the backplane.    
Furthermore, TCP sessions are brought down and resurrected so that 
initial sequence numbers may be established consistently in both 
stations.   Initial TCP sequence numbers can be unrandomized and 
statically based on IP addresses of the router peer to which 
connectivity is sought.  This is because peer routers are typically in 
the same IP domain, usually point-to-point connected and seldom more 
than one hop away.  Because no routing is needed between them, no 
router loops, no transient packets wandering the network are seen, 
especially during a catastrophic failure and thus the necessity for 
randomizing the initial sequence numbers is obviated.   

Alternatively, during the SNMP-GET-ALL step of data synchronization 
(paragraph 5.1), should the MIB for TCP contain current sequence 
numbers, backplane activity can be suspended until the operational 
station performs an SNMP-GET of the standby's TCP sequence numbers to 
ensure they are both coherent before allowing network activity on the 
backplane to proceeed.

6 ALTERNATIVES TO THIS APPROACH

  To preclude outages due to a hardware-oriented catastrophic event, 
designers have produced dual-station systems that stay synchronized by 
using a common clock and shared memory.   This is a very expensive 
solution which does not address problems due to a hardware malfunction 
of the clock.   If the hardware-oriented problem becomes manifest in a 
memory or semaphore "stuck-high" in the shared memory, it becomes a 
problem for the standby almost immediately.  Finally, approaches of 
this kind encompass timing synchronization in the Megahertz 
calibration range when the inputs and outputs of the system need only 
be true at the packet-time granularity.


6 OTHER BENEFITS

   Since the investment in redundancy is hopefully never needed owing to 
inherently reliable hardware and software, the standby station may never 
be used in a theoretically 100% reliability case.   Therefore, other 
uses can be switched in such as backplane debugging wherein the standby 
is used as a sniffer to promiscuously record and display packets 
appearing on backplane links.   This can be an incredible asset in 
tuning and debugging the network.   If there is enough station 
throughput, the sniffing can be done simultaneous with the functionality 
already presented.

7 CRYPTOGRAPHY
 
    As cryptography is imposed by the station operator so to will it be 
invoked on the redundant station vis-a-vis the telnet echo of the same.   
Specialized cryptography may also be created to facilitate this echoing 
and the echoing of SNMP management requests to the management port.

                           [7]

RFC DONK      Redundant TCP         June 2005

8 REFERENCES

o Structure and Identification of Management Information for 
  TCP/IP-based Internets, RFC 1155, M. Rose, K. McCloghrie 

o Management Information Base for Network Management of TCP/IP-based 
  Internets, RFC 1156, K. McCloghrie, M. Rose 

o SNMPv2 Management Information., RFC 2013, K. McCloghrie

o Simple Network Management Protocol (SNMP), RFC 1157, J. Case, 
  M. Fedor, M. Schoffstall, C. Davin 

o Management Information Base for Network Management of TCP/IP-based 
  Internets: MIB-II, RFC 1158, M. Rose

o Transmission Control Protocol, RFC 793, J. Postel 

o Requirements for Internet Hosts--Communication Layers, RFC 1122, 
  R. Braden, ed. 

o TCP Extensions for High Performance, RFC 1323, V. Jacobson, R. Braden, 
  D. Borman

o Internet Protocol, RFC 791, J. Postel

o Ethernet Address Resolution Protocol, RFC 826, D. Plummer

9 DEFINITIONS OF TERMS

Internet address: A 32-bit address assigned to hosts using TCP/IP.

Operational station: Of two completely redundant stations, the station 
   actually transmitting packets.

protocol: A formal description of messages to be exchanged and rules
   to be followed for two or more systems to exchange information.
 
router: A system responsible for making decisions about which of
   several paths network (or Internet) traffic will follow.  To do this
   it uses a routing protocol to gain information about the network, and
   algorithms to choose the best route based on several criteria known
   as "routing metrics."  In OSI terminology, a router is a Network
   Layer intermediate system. 

Standby Station: of two completely redundant stations, the station which 
  merely thinks it is transmitting packets but is disallowed at the 
  network driver layer.    It relies on the operational stations 
  transmission. 
 
Telnet: The virtual terminal protocol in the Internet suite of
  protocols.  Allows users of one host to log into a remote host and
  interact as normal terminal users of that host.

Unsynchronized station:  A station unable to complete time synchronization 
  with its redundant counterpart.

9  AUTHOR'S ADDRESS
        Daniel D. Kilsdonk 
        65 Lake Shore Drive North 
        Westford, MA 01886 

Phone: (978) 692-3383

EMail: dan@prospeed.net

APPENDIX A: GLOSSARY OF ACRONYMS

  
FTP: File Transfer Protocol.  The Internet protocol (and program)
   used to transfer files between hosts.  See FTAM.
IP: Internet Protocol.  The network layer protocol for the Internet
   protocol suite.
IS-IS: Intermediate system to Intermediate system protocol.  The OSI
   protocol by which intermediate systems exchange routing information.
MIB: Management Information Base.  A collection of objects that can
   be accessed via a network management protocol.  
OSPF: Open Shortest Path First.  A "Proposed Standard" IGP for the
   Internet.  See IGP.
SNMP: Simple Network Management Protocol.  The network management
   protocol of choice for TCP/IP-based internets. 
TCP: Transmission Control Protocol.  The major transport protocol in
   the Internet suite of protocols providing reliable, connection-
   oriented, full-duplex streams.  Uses IP for delivery.  See TP4.

INTERNET-DRAFT expires 6/13/2006
Copyright (C) The Internet Society (2005). 

This document is subject to the rights, licenses and restrictions contained in BCP 78, 
and except as set forth therein, the authors retain all their rights.

This document and the information contained herein are provided on an "AS IS" basis and
THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE 
INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 
HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR 
FITNESS FOR A PARTICULAR PURPOSE."