Internet DRAFT - draft-dong-ospf-maxage-flush-problem-statement
draft-dong-ospf-maxage-flush-problem-statement
Network Working Group J. Dong
Internet-Draft X. Zhang
Intended status: Informational Huawei Technologies
Expires: May 4, 2017 Z. Li
China Mobile
October 31, 2016
OSPF LSA Flushing Problem Statement
draft-dong-ospf-maxage-flush-problem-statement-01
Abstract
In OSPF protocol, Link State Advertisements (LSAs) are exchanged in
Link State Update (LSU) packets to achieve link state database (LSDB)
synchronization and consistent route calculation. OSPF protocol
specifies several scenarios in which an LSA is flushed with the LS
age field set to MaxAge. In some cases, the flushing of MaxAge LSAs
may cause flooding storm of OSPF packets and severely impact the
services provided by the network.
This document describes the problem of OSPF LSA flushing, and ask for
solutions to solve this problem.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119].
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 4, 2017.
Dong, et al. Expires May 4, 2017 [Page 1]
Internet-Draft OSPF MaxAge Flush Problem Statement October 2016
Copyright Notice
Copyright (c) 2016 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Typical Scenarios of LSA Flushing . . . . . . . . . . . . . . 3
3. Consequence of LSA Flushing . . . . . . . . . . . . . . . . . 3
4. Requirements on Potential Solutions . . . . . . . . . . . . . 4
4.1. Solution for Problem Localization . . . . . . . . . . . . 4
4.2. Solution for Impact Mitigation . . . . . . . . . . . . . 4
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5
6. Security Considerations . . . . . . . . . . . . . . . . . . . 5
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 5
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 5
8.1. Normative References . . . . . . . . . . . . . . . . . . 5
8.2. Informative References . . . . . . . . . . . . . . . . . 6
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6
1. Introduction
In OSPF protocol [RFC2328], Link State Updates (LSAs) are exchanged
in Link State Update (LSU) packets to achieve link state database
(LSDB) synchronization and consistent route calculation. OSPF
specifies several scenarios in which an LSA is flushed with the LS
age field set to MaxAge. In some cases, the flushing of MaxAge LSAs
may cause flooding storm of OSPF packets and severely impact the
services in the network. Since the MaxAge LSA may be flushed by any
OSPF router, usually it would take a long time for troubleshooting
and could cause huge damage to both the network provider and its
customers.
Dong, et al. Expires May 4, 2017 [Page 2]
Internet-Draft OSPF MaxAge Flush Problem Statement October 2016
2. Typical Scenarios of LSA Flushing
[RFC2328] specifies several scenarios in which an LSA should be
flushed with the LS age field set to MaxAge. Under normal
circumstances, the LSA flushing happens when the LS age of an LSA
naturally reaches MaxAge, this can be done by any OSPF router. Since
OSPF router would generate a new instance of the self-originated LSA
when its LS age reaches LSRefreshTime, which is usually the half of
the value of MaxAge, the naturally aging to MaxAge case would only
happen when the originator of the LSA is not reachable in the network
and cannot refresh the LSA.
Another case of LSA flushing is "Premature aging", which is to set
the LS age of a self-originated LSA to MaxAge and then flood the LSA.
Premature aging is used when the self-originated LSA's sequence
number field is about to wrap, or all the external routes previously
advertised by the LSA are no longer reachable. Premature aging and
flushing of LSA can also happen when a router is changed from the
Designated Router (DR) to a non-DR, or in some rare cases the
router's Router ID is changed.
Field experience has shown several circumstances where MaxAge LSA
flushing may be generated by the misbehaved router in the network.
For example, the LS age may be corrupted to reach the MaxAge much
earlier than normally expected. This is difficult to detect with the
existing OSPF checksum mechanism, as the LS age field is excluded
from the checksum calculation of LSA. Besides, OSPF cryptographic
authentication can not detect the corruption of the LS age field if
it happens before the LSA is assembled to LSU packet.
3. Consequence of LSA Flushing
While MaxAge LSA flushing is important for fast convergence and the
consistency of the Link-State DataBase (LSDB) of all OSPF routers, as
shown in several accidents happened in the production network,
improper LSA flushing can have severe impact to the network and the
services provided by the network. This section evaluates the impacts
of MaxAge LSA flushing.
According to section 14 of [RFC2328], the MaxAge LSA can be flushed
by any router, no matter whether this LSA is self-originated or not.
According to the flooding scope of the LSA, this MaxAge LSA would be
flooded either in the whole routing domain or in the specific area.
On all the routers receiving this MaxAge LSA, this would cause the
old LSA instance being replaced, and consequently triggers route
calculation and installation. When the MaxAge LSA is received by the
originating router of this LSA, the originating router would increase
the LSA's LS sequence number one past the received LS sequence
Dong, et al. Expires May 4, 2017 [Page 3]
Internet-Draft OSPF MaxAge Flush Problem Statement October 2016
number, and originate a new instance of the LSA. If the LSA flushing
is due to systematic problem and cannot recover automatically, this
flooding and processing would last forever, which severely impacts
network reachability and stability. Since OSPF is the fundamental
protocol to build the infrastructure for other protocols such as BGP,
LDP, etc., and various services provided by the network, it will
cause huge damage to both the network provider and its customers.
As the MaxAge LSA may be flushed by any OSPF router, usually it would
take a long time for troubleshooting to locate the misbehaved router
in the network, and during this time the LSA flushing could have
caused huge damage to both the network provider and its customers.
4. Requirements on Potential Solutions
Considering the importance of OSPF protocol to the networks and the
services carried in the networks, and the potential severe impact of
MaxAge LSA flooding, this document calls for solutions to protect
against or mitigate the impact of improper MaxAge LSA flushing.
The potential solutions can be classified into two categories, and
the requirements are provided in following sections respectively.
4.1. Solution for Problem Localization
Since OSPF allows the flushing of non-self originated LSAs, for
troubleshooting and problem localization, some mechanism to identify
the misbehaved router quickly is needed. If the improper MaxAge LSA
flushing is caused by systematic problem, operators would need to
locate the misbehaved router and shut it down to stop the flooding
storm.
[RFC6232] proposes to add the Purge Originator Identification (POI)
TLV into IS-IS Purge LSPs to identify the originator of IS-IS Purges.
Although a similar TLV may be added into the OSPF extended LSAs as
defined in [RFC7684] and [I-D.ietf-ospf-ospfv3-lsa-extend], the
structure of the legacy OSPF LSAs as defined in [RFC2328] is not TLV-
based and such mechanism does not apply. Some problem localization
solution which is backward compatible and applicable to all the OSPF
LSAs would be preferred.
4.2. Solution for Impact Mitigation
Since the flooding storm caused by improper LSA flushing can have
severe impact to network stability and the services provided by the
network, it is important to alleviate such impact even before the
root cause or the misbehaved router can be identified. In addition,
some problem localization mechanisms may rely on the availability of
Dong, et al. Expires May 4, 2017 [Page 4]
Internet-Draft OSPF MaxAge Flush Problem Statement October 2016
the network, which means the impact mitigation mechanism is necessary
to ensure that the problem localization mechanisms do work when
severe flooding storm caused by LSA flushing happens in the network.
It is important that the impact mitigation solution is backward
compatible and can support incremental deployment. Preferably, the
mitigation solution should not delay the route convergence triggered
by normal LSA flushing.
5. IANA Considerations
This document makes no request of IANA.
Note to RFC Editor: this section may be removed on publication as an
RFC.
6. Security Considerations
This document describes the problem of MaxAge LSA flushing, which in
some cases is due to the lack of integrity protection of the LS age
field. The LS age field may be altered as a result of software or
hardware problem, such modification cannot be detected by LSA
checksum nor OSPF packet cryptographic authentication. LSA flushing
could have severe impact on network stability and the services
provided by the network. This may be considered as a security
vulnerability.
7. Acknowledgements
The authors would like to thank Bruno Decraene, Acee Lindom and Les
Ginsberg for the discussion on this topic.
8. References
8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
[RFC2328] Moy, J., "OSPF Version 2", STD 54, RFC 2328,
DOI 10.17487/RFC2328, April 1998,
<http://www.rfc-editor.org/info/rfc2328>.
Dong, et al. Expires May 4, 2017 [Page 5]
Internet-Draft OSPF MaxAge Flush Problem Statement October 2016
8.2. Informative References
[I-D.ietf-ospf-ospfv3-lsa-extend]
Lindem, A., Mirtorabi, S., Roy, A., and F. Baker, "OSPFv3
LSA Extendibility", draft-ietf-ospf-ospfv3-lsa-extend-13
(work in progress), October 2016.
[RFC6232] Wei, F., Qin, Y., Li, Z., Li, T., and J. Dong, "Purge
Originator Identification TLV for IS-IS", RFC 6232,
DOI 10.17487/RFC6232, May 2011,
<http://www.rfc-editor.org/info/rfc6232>.
[RFC7684] Psenak, P., Gredler, H., Shakir, R., Henderickx, W.,
Tantsura, J., and A. Lindem, "OSPFv2 Prefix/Link Attribute
Advertisement", RFC 7684, DOI 10.17487/RFC7684, November
2015, <http://www.rfc-editor.org/info/rfc7684>.
Authors' Addresses
Jie Dong
Huawei Technologies
Huawei Campus, No.156 Beiqing Rd.
Beijing 100095
China
Email: jie.dong@huawei.com
Xudong Zhang
Huawei Technologies
Huawei Campus, No.156 Beiqing Rd.
Beijing 100095
China
Email: zhangxudong@huawei.com
Zhenqiang Li
China Mobile
No.32 Xuanwumenxi Ave., Xicheng District
Beijing 100032
China
Email: li_zhenqiang@hotmail.com
Dong, et al. Expires May 4, 2017 [Page 6]