Network Working Group M. Ohye
Internet-Draft J. Kupke
Intended status: Informational March 2012
Expires: August 31, 2012

The Canonical Link Relation
draft-ohye-canonical-link-relation-05

Abstract

RFC5988 specified a way to define relationships between links on the web. This document describes a new type of such relationship, "canonical", to designate an IRI as preferred over resources with duplicative content.

Editorial Note (To be removed by RFC Editor)

Distribution of this document is unlimited. Comments should be sent to the IETF Apps-Discuss mailing list (see https://www.ietf.org/mailman/listinfo/apps-discuss).

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on August 31, 2012.

Copyright Notice

Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

The canonical link relation specifies the preferred IRI from resources with duplicative content. Common implementations of the canonical link relation are to specify the preferred version of an IRI from duplicate pages created with the addition of IRI parameters (e.g., session IDs), or to specify the single-page version as preferred over the same content separated on multiple component pages.

In regard to the link relation type, "canonical" can be described informally as the author's preferred version of a resource. More formally, the canonical link relation specifies the preferred IRI from a set of resources that return the context IRI's content in duplicated form. Once specified, applications such as search engines can focus processing on the canonical, and references to the context (referring) IRI can be updated to reference the target (canonical) IRI.

2. Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

3. The Canonical Link Relation

The target (canonical) IRI MUST identify content that is either duplicative or a superset of the content at the context (referring) IRI. Authors who declare the canonical link relation ought to anticipate that applications such as search engines can:

The target (canonical) IRI MAY:

To better ensure that applications properly handle the canonical link relation, administrators ought to consider the following guidelines:

When the canonical link relation is declared improperly, such as creating chained canonicals (i.e., target IRI specifies the source IRI of a permanent redirect) or designating a target IRI which returns a 4xx response, applications can use their own heuristics when processing the resource. For instance, an application can choose to ignore any improper canonical designation and continue to process the remaining content on a page.

4. Examples

The following example illustrates:

If the preferred version of a IRI and its content exists at:

http://www.example.com/page.php?item=purse

Then duplicate content IRIs such as:

http://www.example.com/page.php?item=purse&category=bags
http://www.example.com/page.php?item=purse&category=bags&sid=1234

may designate the canonical link relation in HTML as specified in [REC-html401-19991224]:

<link rel="canonical"
        href="http://www.example.com/page.php?item=purse">

or as a relative IRI:

<link rel="canonical" href="page.php?item=purse">

or alternatively, in the HTTP header field as specified in Section 5 of [RFC5988]:

Link: <http://www.example.com/page.php?item=purse>; rel="canonical"

This signals to applications, such as search engines, that these are duplicates of the target (canonical) IRI: http://www.example.com/page.php?item=purse.

Applications may then select the canonical value as the display IRI (such as in search results), and additional IRI properties such as indexing and ranking signals, can be transferred as well.

5. Recommendations

Before adding the canonical link relation, verification of the following is RECOMMENDED:

  1. The content of the context IRI is duplicated within the content of the target (canonical) IRI.
  2. For HTTP, Permanent HTTP redirects (Section 10.3.2 of [RFC2616]), the traditional strong indicator that a IRI's content has been permanently moved, could not be implemented in place of the canonical link relation.
  3. In the case where the target (canonical) IRI is a superset of content from the context IRI (i.e., the case where page-1.html and page-2.html designate page-all.html as the canonical), that the user experience is strongly taken into consideration, both in regard to possible increased load time and potential complexity in navigation.

6. IANA Considerations

IANA is asked to register the Canonical Link Relation below as per [RFC5988].

Relation Name:

Description:

Reference:

Notes:

Application Data:

7. Security Considerations

When a site is compromised, the canonical link relation can be implemented with malicious intent to designate the attacker's IRI as the preferred version of the content. While this technique is largely unnoticeable to humans, automated programs may cluster the compromised resource as duplicative of the attacker's target IRI, transferring properties such as link popularity away from the compromised resource to the attacker's designated canonical. (Naturally, even a site that is not compromised could provide inaccurate or misleading information about which URI is canonical.)

8. Internationalisation Considerations

In designating a canonical IRI, please see section 8 of [RFC5988] for information on URI encoding.

9. References

[REC-html401-19991224] Le Hors, A., Raggett, D. and I. Jacobs, "HTML 4.01 Specification", W3C Recommendation REC-html401-19991224, December 1999.

Latest version available at

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
[RFC3986] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.
[RFC5988] Nottingham, M., "Web Linking", RFC 5988, October 2010.

Appendix A. Implementations

Automated programs that implement functionality with regard for the canonical link relation include:

Authors' Addresses

Maile Ohye EMail: maileohye@gmail.com URI: http://maileohye.com/
Joachim Kupke EMail: joachim@kupke.za.net

Table of Contents