REPUTE M. Kucherawy Internet-Draft May 5, 2013 Intended status: Informational Expires: November 6, 2013 Operational Considerations Regarding Reputation Services draft-ietf-repute-considerations-01 Abstract The use of reputation systems is has become a common tool in many applications that seek to apply collected intelligence about traffic sources. Often this is done because it is common or even expected operator practice. It is therefore important to be aware of a number of considerations for both operators and consumers of the data. This document includes a collection of the best advice available regarding providers and consumers of reputation data, based on experience to date. Much of this is based on experience with email reputation systems, but the concepts are generally applicable. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on November 6, 2013. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect Kucherawy Expires November 6, 2013 [Page 1] Internet-Draft Reputation Operations May 2013 to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4. Reputation Clients . . . . . . . . . . . . . . . . . . . . . . 4 5. Reputation Service Providers . . . . . . . . . . . . . . . . . 6 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 8. Informative References . . . . . . . . . . . . . . . . . . . . 8 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . . 8 Kucherawy Expires November 6, 2013 [Page 2] Internet-Draft Reputation Operations May 2013 1. Introduction Reputation services involve collecting feedback from the community about sources of Internet traffic and aggregating that feedback into a rating of some kind. Common examples include feedback about traffic associated with specific email addresses, URIs or parts of URIs, IP addresses, etc. The specific collection, analysis, and rating methods vary from one service to the next and one problem domain to the next, but several operational concepts appear to be common to all of these. The promise of the protection that reputation services offers can be enticing, and many users and operators alike typically engage those services merely because it is expected of them. A critical notion, however, is that doing so explicitly involves a third party in the flow of data those parties receive. This is often taken for granted, with potentially disastrous results. This document highlights this and other considerations in providing and consuming reputation data services. 2. Background The community has historically focused on identifying sources that misbehave, i.e., that earn negative reputations. The purpose here is to identify and filter traffic from bad actors. This grew out of operational need. As the Internet grew, so did the occurence of problematic traffic, especially in email. The pragmatics of email (i.e., the fact that the total IP address space is more constrained than the total email address space) drove the focus on using IP addresses as the focus of reputation, in addition to the fact that IP addresses have a degree of validation (via the TCP/IP infrastructure) where email addresses have had none. A specific example of a reputation service in common use in the email space is the DNS blacklist [DNSBL]. This is a method of querying a database as to whether a source of incoming [SMTP] email traffic should be allowed to relay email, based on previous observations and feedback. The method uses the IP address of the source as the basis for a query to the database using the Domain Name System [DNS] as the interface. [DNSBL] includes several points in its Security Considerations document that are repeated and further developed here. However, regardless of the identifier used as the identifier for a reputation, bad actors can evade detection or the effects of their observed behavior by changing identifiers (e.g., move to a new IP address, register a new domain name, use a sub-domain). This makes the problem space effectively boundless, especially as IPv6 rolls Kucherawy Expires November 6, 2013 [Page 3] Internet-Draft Reputation Operations May 2013 out. 3. Evolution More modern thinking is evolving toward the identification of good actors rather than bad actors, and giving them preferential treatment. This drastically reduces the problem space: There are vastly more IP addresses and email addresses used by bad actors to generate problematic traffic than are used by good actors to generate desirable traffic. Moreover, good actors tend to be represented by stable names and addresses, allowing users to rely on these to identify and give preferential treatment to their traffic. Good actors have no need to hop around to different addresses, and already work to keep their traffic clean. This notion has only been tried to date using manually edited whitelists, but has shown promising results on that scale. 4. Reputation Clients Operators that choose to make use of reputation services to influence content allowed to pass into or through their infrastructures need to understand that they are granting a third party (the reputation service provider, or RSP) the ability to affect incoming traffic, for better or worse. Of course, this is the whole point of engaging an RSP when everything is working properly, but a number of issues are worthy of consideration before establishing such a relationship. Some cases have occurred where an RSP made the unilateral decision to terminate its service. To encourage its clients to stop issuing queries, it began reporting a maximally negative reputation about all subjects, causing rejection of all incoming traffic during the incident period. Although one would hope such incidents to be rare, automated means to detect such unfortunate returns (malicious or otherwise) and take remedial should be considered. RSPs will be the subject of attacks once it is understood that sucess in doing so will allow malicious content to evade detection and filtering. Users of RSPs need to be aware of possible interruptions in service availability or quality. Similarly, some actors will try to "game" the service, which is to say that such actors will attempt to determine patterns of behavior that result in the reporting of favorable reputations, and in doing so, acquire artifically inflated reputations. One could reasonably assume that a reputation service is inherently fragile. For Kucherawy Expires November 6, 2013 [Page 4] Internet-Draft Reputation Operations May 2013 operational clients, this should prompt balanced and comparative, rather than unilateral, use of the service. It is suggested that, when engaging an RSP, an operator should try to learn the following things about the RSP in order to understand the exposure potential: o the RSP's basis for listing or not listing particular subjects; o if an RSP is paid by its listees, the rate and criteria for rejection from being listed; o how the RSP collects data about subjects; o how many data points are input to the reported reputation; o whether reputation is based on a reliable identifier; o how the RSP establishes reliability and authenticity of those data; o how data validity is maintained (e.g., on-going monitoring of the reported data and sources); o how actively data validity is tracked (e.g., how changes are detected); o how disputed reputations are handled; o how often input data expire; o whether older information more or less influential than newer; o whether the reported reputation a scalar, a Boolean value, a collection of values, or something else; o when transitioning among RSPs, the differences between them among these above points; that is, whether a particular score from one means the same thing from another. An operator using an RSP would be wise to ensure it has the capability to effect local overrides for cases where the client expects to disagree with the reported reputation. An operator should be able limit the impact of a negative reputation on content acceptance. For example, rather than rejecting content outright when a negative reputation is returned, simply subject it to additional (i.e., more thorough) local analyis before permitting the Kucherawy Expires November 6, 2013 [Page 5] Internet-Draft Reputation Operations May 2013 traffic to pass. A sensible default should apply when the RSP is not available. This may also be a query to a different RSP known to be less robust than the primary one. Recent proposals have focused on tailoring operation to prefer or emphasize content whose sources have positive reputations. As stated above, negative reputations are easy to shed, and the universe of things that will earn and maintain positive reputations is relatively small. Designing a filtering system that observes these notions is expected to be more lightweight to operate and harder to game. One choice is to query and cross-referencing multiple RSPs. This can help to detect which ones under comparison are reliable, and offsets the effect of anomalous replies. 5. Reputation Service Providers Operators intending to provide a reptuation service need to consider that there are many flavors of clients. There will be clients that are prepared to make use of a reputation service blindly, while others will be interested in understanding more fully the nature of the service being provided. An operator of an RSP should be prepared to answer as may of the questions identified in Section 4 as possible, not only because wise clients will ask, but also because they reflect issues that have arisen over the years, and exploration of the points they raise will result in a more robust reputation service. Obviously, in computing reputations via traffic analysis, some private algorithms may come into play. For some RSPs, such "secret sauce" comprises their competitive advantage over others in the same space. This document is not suggesting that all private algorithms need to be exposed for a reputation service to be acceptable. Instead, it is anticipated that enough of the above details need to be available to ensure consumers (and in some cases, industry or the general public) that the RSP can be trusted to influence key local policy decisions. Reptuations should be based on accurate identifiers, i.e., some property of the content under analysis that is difficult to falsify. For example, in the realm of email, the address found in the From: field of a message is typically not verifiable, while the domain name found in a validated domain-level signature is. In this case, constructing a reputation system based on the domain name is more useful than one based on the From: field. Kucherawy Expires November 6, 2013 [Page 6] Internet-Draft Reputation Operations May 2013 The biggest frustration with most RSPs to date has been the absence of a visible, accessible, and transparent process for remediating the errant addition of an identifier to a negative reputation list. An RSP in widespread use is perceived to have enormous power when its results are used to reject traffic outright; when a "bad" entry is added referencing a good actor, it can have destructive effects, so an effective mechanism to fix such problems needs to exist. To accommodate clients with varying sensitivities, it is advisable for the query mechanism used to access the RSP to provide the ability to request details in the returned result about how the result was reached, allowing the client to decide if the result should be applied. For example, it shoudl be possible for the reply to contain: o the result itself; o the number of data points used to compute the result; o the age range of the data; o source diversity of the input data; o currency of the result (i.e., when it was computed); o basis of the result (i.e., which identifier was used). The systems and algorithms used by the RSP to compute the reported reputation will need to be hardened as much as practicable against gaming or other forms of data poisoning. Larger source diversities are harder to overcome with poisoned input, but are expensive to build in terms of both infrastructure and time. Systems focused on assigning positive reputations rather than negtive ones are promising since positive reputations, if made difficult to earn, put a large cost on bad actors, which may be enough to dissuade them entirely. 6. Security Considerations Several points are raised above that can be described as threats to the delivery of valid user data. This document highlights and discusses those matters, but introduces no new security issues. 7. IANA Considerations This memo contains no actions for IANA. Kucherawy Expires November 6, 2013 [Page 7] Internet-Draft Reputation Operations May 2013 [RFC Editor: Please remove this section prior to publication.] 8. Informative References [DNS] Mockapetris, P., "Domain Names -- Concepts and Facilities", RFC 1034, November 1987. [DNSBL] Levine, J., "DNS Blacklists and Whitelists", RFC 5782, February 2010. [SMTP] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, October 2008. Appendix A. Acknowledgements The author wishes to acknowledge the following for their review and constructive criticism of this proposal: Chris Barton, Vincent Schonau Author's Address Murray S. Kucherawy EMail: superuser@gmail.com Kucherawy Expires November 6, 2013 [Page 8]