INTERNET DRAFT Srinivas Mantripragada Category: Informational NetContinuum Title: draft-srinivas-wat-00.txt Prasad Vellanki Date: Decemeber 1, 2003 NetContinuum Expires: June 1, 2004 Sridhar Raman NetContinuum Venkata Nambula NetContinuum Web Address Translation (WAT) Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsolete by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at: http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at: http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2003). All Rights Reserved. Abstract This draft specifies Web Address Translation (WAT) mechanism. The scheme allows user to hide or rewrite the backend (and internal) domain addresses. The scheme is based on a suite of URL translation schemes without requiring any of the backend application servers to be reconfigured This allows to host multiple applications on different virtual servers via a single domain. The analogy is similar to NAT except that the proposed implementation scheme operates at the web application layer and brings the value of web address translation mechanism up into the network than doing the functions only in the web servers at the end point. Srinivas Expires May 24, 2004 [Page 1] Internet-Draft WAT Nov 2003 Table of Contents 1.0 Introduction 2.0 Terminology 3.0 WAT Implementation 3.1 Website Cloaking 3.1.1 Status 3.1.2 Suppress Return Code 3.1.3 Filter Response Header 3.1.4 Headers to Filter 3.2 URL Translations 3.2.1 Status 3.2.2 External URL 3.2.3 External Domain 3.2.4 Internal URL 3.2.5 Internal Domain 3.3 URL Rewrites 3.3.1 Status 3.3.2 Matching Rule 3.3.3 Sequence Number 3.3.4 Action 3.3.4.1 Insert Header 3.3.4.2 Remove Header 3.3.4.3 Replace Header 3.3.4.4 Rewrite URL 3.3.4.5 Redirect URL 3.3.5 Header 3.3.6 Continue Processing other Rewrites 3.3.7 New Value 4.0 Authors 5.0 Full Copyright statement 1.0 Introduction Enterprises are actively migrating business applications to web technologies to improve access and control costs. At the same time, the threat of attack is growing exponentially, with the majority of attacks now exploiting application-layer vulnerabilities. While traditional network firewalls address network access control, blocking unauthorized network-level requests, application firewalls address the application layer by enforcing security policies within application sessions. An application firewall specifically protects the Web application communication stream and all associated application resources from attacks that happen via the Web protocol. The logical place to add this protection is at the corporate edge where the traditional firewall currently sits. A major portion of web attacks is through tampering with the HTTP protocol compliant URLs and header fields. One of the pre-requisites of a true web application firewall is URL and header protection. In this draft, we propose Web Address Srinivas Expires May 24, 2004 [Page 2] Internet-Draft WAT Nov 2003 Translation (WAT) scheme that can be effectively implemented as a standard mechanism to provide URL and header protection at the edge. The key highlights of the WAT implementation scheme include: (1) Ability to hide an internal structure of a company's web site. (2) Create a homogeneous and consistent URL layout over all WWW servers within an Intranet Web cluster. (3) Give the WWW namespace a consistent server-independent layout. (4) Provide a consistent URL translation mechanism by which: (4.1) Exported URLs do not need to bind to any physically correct target server. (4.2) Applications do not need to be altered to work outside the firewall. The authors feel that WAT is a natural extension to NAT implementation (RFC 1631) with a different goal in mind. The NAT implementation presents a technique to connect end IP addresses in a public (or external) network to communicate with the end IP addresses in a private (or internal) network and vice-versa. NAT works by using the several million private addresses that have been put aside by the Internet Engineering Task Force, turning a public IP address such as 192.156.136.22 into a private address, such as 10.0.0.4, for delivery to a user's PC. Private IP addresses cannot be "seen" by the Internet, and therefore may be reused by various enterprise networks. The WAT implementation adopts similar philosophy and proposes a series of techniques to modify and translate URLs and headers that are globally visible in the WWW namespace to a private URL namespace that is not visible to the external world. The WAT implementation specifics are described in the subsequent sections. 2.0 Terminology The following terms are used in the rest of the document. 2.1 Network Address Translation (NAT) The term NAT in this document refers to translation of a private/internal IP address to a public/external IP address and vice-versa. 2.2 Uniform Resource Identifier (URI) The W3C's codification of the name and address syntax of present and future objects on the Internet. In its most basic form, a URI consists of a scheme name (such as file, http, ftp, news, mailto, gopher) followed by a colon, followed by a path whose nature is determined by the scheme that precedes it (see RFC 1630). URI is the umbrella term for URNs, URLs, and all other Uniform Resource Identifiers. 2.3 World Wide Web (WWW) The World Wide Web is a collection of information servers linked Srinivas Expires May 24, 2004 [Page 3] Internet-Draft WAT Nov 2003 together through a language called hypertext. This allows you to select a hypertext link on one page which may take you to a different server halfway around the world. 2.4 Uniform Resource Locator (URL) The World Wide Web address of a site on the Internet. 2.5 Hypertext Reference (HREF) This is an attribute used to set the URL of an object that is being referenced. This attribute is used in many tags, but mostly the tag. 3.0 WAT Implementation The proposed WAT implementation is split into 3 main techniques. (1) Website Cloaking (2) URL Translations (3) URL Rewrite (Request and Response) 3.1 Website Cloaking The Website Cloaking is described as a method to conceal enterprise web resources from hackers and worms scanning for vulnerabilities. Almost every successful attack is preceded by probing websites for weakness. Readily available tools on the Internet such as Whisker, Nessus and Nikto make it easy for potential intruders to scan any website, determining exactly how applications were built, what kind of servers they are running on, and which URLs contain vulnerabilities. Worms such as Code Red auto scan the internet for specific server types with known vulnerabilities in order to launch an attack. In the proposed implementation, website cloaking effectively hides URL return codes, HTTP headers and backend IP addresses. As a result, there is zero visibility into which web servers, application servers, operating systems, directory structure and patches are running on the protected web sites. The implementation details follows: 3.1.1 Status This parameter is used to enable or disable this policy. 3.1.2 Suppress Return Code When enabled, this parameter blocks the return of an HTTP status code in a response header. These codes are returned from a server if there is a problem with the browser or the Web server, itself. The two types of response error codes that are suppressed include: Srinivas Expires May 24, 2004 [Page 4] Internet-Draft WAT Nov 2003 . 4xx (client): These are "400-series" error codes. These codes are intended for instances where a client seems to have erred when attempting to access a Web page. For example, "404: Page not found." . 5xx (server): These are "500-series" error codes. These codes are intended to indicate that a Web server is aware that it has a problem or that it is incapable of performing a request. For example, "500: Internal Error". With these codes "suppressed", weakness in any infrastructure will also be suppressed, since the hacker will not know whether there is a problem with the client or the Web server. 3.1.3 Filter Response Header When enabled, this parameter filters a specific HTTP header in a response. The actual HTTP header response can be defined by using the "Headers" option (defined below). 3.1.4 Headers This parameter is used to define the banner header in a response that needs to be filtered. The input format is specified in string format. 3.2 URL Translations When a Web site sends a page to a user, it typically includes a variety of embedded references to other objects on the site. If the reference are relative, meaning that they don't include the name of the server within them (/content.html) rather than absolute (http://www.example.come/content.html), there is no problem. However, most Web sites do embed absolute links. Two problems arise. The First frequently occurs in situations where a proxy is performing SSL acceleration. When links embedded in the document are prefixed with "http" instead of "https", "users" click are made to the unencrypted pages where URLs are sometimes delivered without question or just don't work. The URL translation mechanism should allow parsing the response and rewriting the "http" to "https". The second problem occurs when a proxy's domain name is different from the server's name - for example, a server named server.example.com and a proxy called www.example.com. Applications that look to the host name might end up embedding links such as http://server.example.com/content.html when they should say http://www.example.com/content.html. Srinivas Expires May 24, 2004 [Page 5] Internet-Draft WAT Nov 2003 JavaScript and HTTP cookies increase the problem. JavaScript-driven pages often dynamically assemble URLs on the client side, and the HTTP cookies are sent from the server such that the client will only send them back when communication with the server and not through a proxy. In most cases, site administrators lack the resources to make the changes to applications to fix a problem. Instead, what is needed is a rewriting/mapping of incorrect URLs to the correct form. The rewriting/mapping has to happen for links being sent from the server to the client and for HTTP requests from the client to the server. The URL translation is able to rewrite URLs embedded within HTML, DHTML, XHTML, Cascading Style Sheets, JavaScript, HTTP cookies and Flash. A link that once appeared as http://intranet.company.com/content.html will now appear as https://proxy.company.com/prx/000/http/intranet.company.com/content.html The URL translation occurs such that everything is syntactically and semantically correct. Step1: User request https://www.example.com Step2: Server responds with content which includes the link" http://server.example.com/images/logi.jpg Step3: URL translation rewrites the outgoing response and sends it to the User. Step4: User requests https://example.com/prx/00/http/server.example.com/image/logo.jpg Step5: URL translation rewrites the incoming request and the server recieves: http://server.example.images/logo.jpg By performing the above operations, the server doesn't realize that the content was modified in any way. This helps provide application security. The implementation details follows: 3.2.1 Status Expects a Boolean input [Yes/No]. The parameter is used to enable or disable this feature. Srinivas Expires May 24, 2004 [Page 6] Internet-Draft WAT Nov 2003 3.2.2 External URL Expects a string input. External URL should be publicly exported URL in the WWW namespace and has to be unique. An empty value means that no translations need to be performed on this external URL. Requests coming from the client with the matching input string are mapped to a unique URL translation rule. The domain part of the outgoing requests is rewritten back with the input value. The string "*" means rewrite all absolute URLs on the response data. Domain can be a suffix pattern or a simple string. 3.2.3 External Domain Expects a string input. External domain should be the publicly exported Domain in the WWW namespace and has to be unique. For example www.mysite.com, www.mydomain.com etc. 3.2.4 Internal URL Expects a string input. Internal URL should always start with a '/' character and should be locally visible. 3.2.5 Internal Domain Expects a string input. Internal Domain represents the local namespace server or IP address that is not visible (or exported) to the external user. 3.2.5 Example The following example configuration can be used to translate a URL internally mounted as /bugzilla to an externally visible URL, http://www.mydomain.com/bugs. As a result the internally mounted URL is now invisible to the external user. http://www.mydomain.com/bugs => /bugzilla Name: bugs Status: On/Off External URL: /bugs External Domain: www.mydomain.com Internal URL: / Internal Domain: bugzilla 3.3 URL Rewrite The WAT implementation proposes URL rewrite for both incoming requests and outgoing responses. The specific implementation details follows: 3.3.1 Status This parameter is used to enable or disable this feature. Srinivas Expires May 24, 2004 [Page 7] Internet-Draft WAT Nov 2003 3.3.2 Matching Rule Expects a string input. Can be in a regular expression or a prefix- suffix pattern. Can specify multiple rules. The pattern will be used to match the URL or the Header as specified in the Action field below. 3.3.3 Sequence Number Expects a non-negative value. The number specifies the order in which the matching rules as specified in 3.3.2 need to be processed. 3.3.4 Action The Action field specifies the operation that needs to be followed once the rule is matched. The action attributes apply to only Header and URL field items and are listed below: 3.3.4.1 Insert Header The matching rule specified in 3.3.2 applies to Header field. This applies to both incoming request and outgoing response. If the rule matches, insert a header field, the value is specified in "New Value" field, as specified in 3.3.7. 3.3.4.2 Remove Header The matching rule specified in 3.3.2 applies to Header field. This applies to both incoming request and outgoing response. If the rule matches, remove the header field. 3.3.4.3 Replace Header The matching rule specified in 3.3.2 applies to Header field. This applies to both incoming request and outgoing response. If the rule matches, replace the old header value with the new value, as specified in 3.3.7. 3.3.4.4 Rewrite URL The matching rule specified in 3.3.2 applies to URL field. This applies to incoming request only. If the rule matches, rewrite the URL with the new URL as specified in 3.3.7. 3.3.4.5 Redirect URL The matching rule specified in 3.3.2 applies to URL field. This applies to incoming requests only. If the rule matches, redirect the URL to a new location. The new URL value is specified in 3.3.7. 3.3.5 Header Expects a string input. Specifies one of the many header fields that need to be matched and the corresponding action as specified in 3.3.4 that needs to be taken. Srinivas Expires May 24, 2004 [Page 8] Internet-Draft WAT Nov 2003 3.3.6 Continue Processing other Rewrites Expects a boolean input [Yes/No]. Provides an option for the rewrite engine to stop after the first match or continue processing all the rules specified. 3.3.7 New Value Expects a string input. This specifies the new value that the action as specified in 3.3.4 needs to operate upon. 4.0 References [NAT] Egevang, K. and P. Francis, "The IP Network Address Translator (NAT)", RFC 1631, May 1994. [NAT-TERM] Srisuresh, P. and M. Holdrege, "IP Network Address Translator (NAT) Terminology and Considerations", RFC 2663, August 1999. 5.0 Authors Srinivas Mantripragada 1705 Wyatt Drive Santa Clara, CA 95054 USA Phone: 408-961-5600 Fax: 408-986-8997 Email: srinivas@netcontinuum.com 6.0 Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assignees. Srinivas Expires May 24, 2004 [Page 9] Internet-Draft WAT Nov 2003 This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Srinivas Expires May 24, 2004 [Page 10]