TOC 
Network Working GroupA. Bryan
Internet-DraftN. McNab
Intended status: Standards TrackH. Nordstrom
Expires: July 24, 2011T. Tsujikawa
  
 P. Poeml
 MirrorBrain
 A. Ford
 Roke Manor Research
 January 20, 2011


Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Header Fields
draft-bryan-metalinkhttp-19

Abstract

This document specifies Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP header fields, a different way to get information that is usually contained in the Metalink XML-based download description format. Metalink/HTTP describes multiple download locations (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures, and other information using existing standards for HTTP header fields. Clients can use this information to make file transfers more robust and reliable.

Editorial Note (To be removed by RFC Editor)

Discussion of this draft should take place on the HTTPBIS working group mailing list (ietf-http-wg@w3.org), althought this draft is not a WG item.

The changes in this draft are summarized in Appendix C (Document History).

Status of this Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

This Internet-Draft will expire on July 24, 2011.

Copyright Notice

Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.



Table of Contents

1.  Introduction
    1.1.  Operation Overview
    1.2.  Examples
    1.3.  Notational Conventions
2.  Requirements
3.  Mirrors / Multiple Download Locations
    3.1.  Mirror Priority
    3.2.  Mirror Geographical Location
    3.3.  Coordinated Mirror Policies
    3.4.  Mirror Depth
4.  Peer-to-Peer / Metainfo
    4.1.  Metalink/XML Files
5.  OpenPGP Signatures
6.  Cryptographic Hashes of Whole Files
7.  Client / Server Multi-source Download Interaction
    7.1.  Error Prevention, Detection, and Correction
        7.1.1.  Error Prevention (Early File Mismatch Detection)
        7.1.2.  Error Correction
8.  Multi-server Performance
9.  IANA Considerations
10.  Security Considerations
    10.1.  URIs and IRIs
    10.2.  Spoofing
    10.3.  Cryptographic Hashes
    10.4.  Signing
11.  References
    11.1.  Normative References
    11.2.  Informative References
Appendix A.  Acknowledgements and Contributors
Appendix B.  Comparisons to Similar Options
Appendix C.  Document History
§  Authors' Addresses




 TOC 

1.  Introduction

Metalink/HTTP is an alternative representation of Metalink information, which is usually presented as an XML-based document format [RFC5854] (Bryan, A., Tsujikawa, T., McNab, N., and P. Poeml, “The Metalink Download Description Format,” June 2010.). Metalink/HTTP attempts to provide as much functionality as the Metalink/XML format by using existing standards such as Web Linking [RFC5988] (Nottingham, M., “Web Linking,” October 2010.), Instance Digests in HTTP [RFC3230] (Mogul, J. and A. Van Hoff, “Instance Digests in HTTP,” January 2002.), and Entity Tags (also known as ETags) [RFC2616] (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.). Metalink/HTTP is used to list information about a file to be downloaded. This can include lists of multiple URIs (mirrors), Peer-to-Peer information, cryptographic hashes, and digital signatures.

Identical copies of a file are frequently accessible in multiple locations on the Internet over a variety of protocols (such as FTP, HTTP, and Peer-to-Peer). In some cases, users are shown a list of these multiple download locations (mirrors) and must manually select a single one on the basis of geographical location, priority, or bandwidth. This distributes the load across multiple servers, and should also increase throughput and resilience. At times, however, individual servers can be slow, outdated, or unreachable, but this can not be determined until the download has been initiated. Users will rarely have sufficient information to choose the most appropriate server, and will often choose the first in a list which might not be optimal for their needs, and will lead to a particular server getting a disproportionate share of load. The use of suboptimal mirrors can lead to the user canceling and restarting the download to try to manually find a better source. During downloads, errors in transmission can corrupt the file. There are no easy ways to repair these files. For large downloads this can be extremely troublesome. Any of the number of problems that can occur during a download lead to frustration on the part of users.

Some popular sites automate the process of selecting mirrors using DNS load balancing, both to approximately balance load between servers, and to direct clients to nearby servers with the hope that this improves throughput. Indeed, DNS load balancing can balance long-term server load fairly effectively, but it is less effective at delivering the best throughput to users when the bottleneck is not the server but the network.

This document describes a mechanism by which the benefit of mirrors can be automatically and more effectively realized. All the information about a download, including mirrors, cryptographic hashes, digital signatures, and more can be transferred in coordinated HTTP header fields hereafter referred to as a Metalink. This Metalink transfers the knowledge of the download server (and mirror database) to the client. Clients can fallback to other mirrors if the current one has an issue. With this knowledge, the client is enabled to work its way to a successful download even under adverse circumstances. All this can be done without complicated user interaction and the download can be much more reliable and efficient. In contrast, a traditional HTTP redirect to a mirror conveys only extremely minimal information - one link to one server, and there is no provision in the HTTP protocol to handle failures. Furthermore, in order to provide better load distribution across servers and potentially faster downloads to users, Metalink/HTTP facilitates multi-source downloads, where portions of a file are downloaded from multiple mirrors (and optionally, Peer-to-Peer) simultaneously.



 TOC 

1.1.  Operation Overview

Detailed discussion of Metalink operation is covered in Section 2 (Requirements); this section will present a very brief, high-level overview of how Metalink achieves its goals.

Upon connection to a Metalink/HTTP server, a client will receive information about other sources of the same resource and a cryptographic hash of the whole resource. The client will then be able to request chunks of the file from the various sources, scheduling appropriately in order to maximise the download rate.



 TOC 

1.2.  Examples

A brief Metalink server response with ETag, mirrors, .metalink, OpenPGP signature, and a cryptographic hash of the whole file:

Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
Link: <http://www2.example.com/example.ext>; rel=duplicate
Link: <ftp://ftp.example.com/example.ext>; rel=duplicate
Link: <http://example.com/example.ext.torrent>; rel=describedby;
type="application/x-bittorrent"
Link: <http://example.com/example.ext.metalink>; rel=describedby;
type="application/metalink4+xml"
Link: <http://example.com/example.ext.asc>; rel=describedby;
type="application/pgp-signature"
Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==


 TOC 

1.3.  Notational Conventions

This specification describes conformance of Metalink/HTTP.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, [RFC2119] (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.), as scoped to those conformance targets.



 TOC 

2.  Requirements

In this context, "Metalink" refers to Metalink/HTTP which consists of mirrors and cryptographic hashes in HTTP header fields as described in this document. "Metalink/XML" refers to the XML format described in [RFC5854] (Bryan, A., Tsujikawa, T., McNab, N., and P. Poeml, “The Metalink Download Description Format,” June 2010.).

Metalink resources include Link header fields [RFC5988] (Nottingham, M., “Web Linking,” October 2010.) to present a list of mirrors in the response to a client request for the resource. Metalink servers MUST include the cryptographic hash of a resource via Instance Digests in HTTP [RFC3230] (Mogul, J. and A. Van Hoff, “Instance Digests in HTTP,” January 2002.). Valid algorithms are found in the IANA registry named "Hypertext Transfer Protocol (HTTP) Digest Algorithm Values" at <http://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml>. SHA-256 and SHA-512 were added by [RFC5843] (Bryan, A., “Additional Hash Algorithms for HTTP Instance Digests,” April 2010.).

Metalink servers are HTTP servers with one or more Metalink resources. Metalink servers MUST support the Link header fields for listing mirrors and MUST support Instance Digests in HTTP [RFC3230] (Mogul, J. and A. Van Hoff, “Instance Digests in HTTP,” January 2002.). Metalink servers MUST return the same Link header fields and Instance Digests on HEAD requests. Metalink servers and their associated mirror servers SHOULD all share the same ETag policy. To have the same ETag policy means that ETags are synchronized across servers for resources that are mirrored, i.e. byte-for-byte identical files will have the same ETag on mirrors that they have on the Metalink server. ETags could be based on the file contents (cryptographic hash) and not server-unique filesystem metadata. The emitted ETag could be implemented the same as the Instance Digest for simplicity. Metalink servers can offer Metalink/XML documents that contain cryptographic hashes of parts of the file and other information.

Mirror servers are typically FTP or HTTP servers that "mirror" another server. That is, they provide identical copies of (at least some) files that are also on the mirrored server. Mirror servers can also be Metalink servers. Mirror servers SHOULD support serving partial content. HTTP mirror servers SHOULD share the same ETag policy as the originating Metalink server. HTTP Mirror servers SHOULD support Instance Digests in HTTP [RFC3230] (Mogul, J. and A. Van Hoff, “Instance Digests in HTTP,” January 2002.).

Metalink clients use the mirrors provided by a Metalink server with Link header fields [RFC5988] (Nottingham, M., “Web Linking,” October 2010.). Metalink clients MUST support HTTP and SHOULD support FTP [RFC0959] (Postel, J. and J. Reynolds, “File Transfer Protocol,” October 1985.). Metalink clients MAY support BitTorrent [BITTORRENT] (Cohen, B., “The BitTorrent Protocol Specification,” February 2008.), or other download methods. Metalink clients SHOULD switch downloads from one mirror to another if a mirror becomes unreachable. Metalink clients MAY support multi-source, or parallel, downloads, where portions of a file can be downloaded from multiple mirrors simultaneously (and optionally, from Peer-to-Peer sources). Metalink clients MUST support Instance Digests in HTTP [RFC3230] (Mogul, J. and A. Van Hoff, “Instance Digests in HTTP,” January 2002.) by requesting and verifying cryptographic hashes. Metalink clients MAY make use of digital signatures if they are offered.



 TOC 

3.  Mirrors / Multiple Download Locations

Mirrors are specified with the Link header fields [RFC5988] (Nottingham, M., “Web Linking,” October 2010.) and a relation type of "duplicate" as defined in Section 9 (IANA Considerations).

A brief Metalink server response with two mirrors only:

Link: <http://www2.example.com/example.ext>; rel=duplicate;
pri=1; pref
Link: <ftp://ftp.example.com/example.ext>; rel=duplicate;
pri=2; geo=gb; depth=1

[[Some organizations have many mirrors. Only send a few mirrors, or only use the Link header fields if Want-Digest is used?]]

It is up to the server to choose how many Link header fieldss to send. Such a decision could be a hard-coded limit, a random selection, based on file size, or based on server load.



 TOC 

3.1.  Mirror Priority

Entries for mirror servers are listed in order of priority (from most preferred to least) or have a "pri" value, where mirrors with lower values are used first.

This is purely an expression of the server's preferences; it is up to the client what it does with this information, particularly with reference to how many servers to use at any one time.



 TOC 

3.2.  Mirror Geographical Location

Entries for a mirror servers can have a "geo" value, which is a [ISO3166‑1] (International Organization for Standardization, “ISO 3166-1:2006. Codes for the representation of names of countries and their subdivisions -- Part 1: Country codes,” November 2006.) alpha-2 two letter country code for the geographical location of the physical server the URI is used to access. A client can use this information to select a mirror, or set of mirrors, that are geographically near (if the client has access to such information), with the aim of reducing network load at inter-country bottlenecks.



 TOC 

3.3.  Coordinated Mirror Policies

There are two types of mirror servers: preferred and normal. Preferred mirror servers are HTTP mirror servers that MUST share the same ETag policy as the originating Metalink server. Preferred mirrors make it possible to detect early on, before data is transferred, if the file requested matches the desired file. Entries for preferred HTTP mirror servers have a "pref" value. By default, if unspecified then mirrors are considered "normal" and do not necessarily share the same ETag policy. FTP mirrors, as they do not emit ETags, are considered "normal". ([draft‑ietf‑ftpext2‑hash] (Bryan, A., Kosse, T., and D. Stenberg, “FTP Extensions for Cryptographic Hashes,” November 2010.) allows for FTP mirrors to be coordinated and provide file hashes).

HTTP Mirror servers SHOULD support Instance Digests in HTTP [RFC3230] (Mogul, J. and A. Van Hoff, “Instance Digests in HTTP,” January 2002.). Optimally, mirror servers will share the same ETag policy and support Instance Digests in HTTP.



 TOC 

3.4.  Mirror Depth

Some mirrors can mirror single files, whole directories, or multiple directories.

Entries for mirror servers can have a "depth" value, where "depth=0" is the default. A value of 0 means ONLY that file is mirrored and that other URI path segments are not. A value of 1 means that file and all other files and URI path segments contained in the rightmost URI path segment are mirrored. For values of N, you go up N-1 URI path segments above. A value of 2 means means going up one URI path segment above, and all files and URI path segments contained are mirrored. For each higher value, another URI path segment closer to the Host is mirrored.

A mirror with a depth value of 4:

Link: <http://www2.example.com/dir1/dir2/dir3/dir4/dir5/example.ext>;
rel=duplicate; pri=1; pref; depth=4

In the above example, 4 URI path segments up are mirrored, from /dir2/ on down.



 TOC 

4.  Peer-to-Peer / Metainfo

Entries for metainfo files, which describe ways to download a file over Peer-to-Peer networks or otherwise, are specified with the Link header fields [RFC5988] (Nottingham, M., “Web Linking,” October 2010.) and a relation type of "describedby" and a type parameter that indicates the MIME type of the metadata available at the URI. Since metainfo files can sometimes describe multiple files, or the filename may not be the same on the Metalink server and in the metainfo file but still have the same content, an optional name parameter can be used.

A brief Metalink server response with .torrent and .metalink:

Link: <http://example.com/example.ext.torrent>; rel=describedby;
type="application/x-bittorrent"; name="differentname.ext"
Link: <http://example.com/example.ext.metalink>; rel=describedby;
type="application/metalink4+xml"

Metalink clients MAY support the use of metainfo files for downloading files.



 TOC 

4.1.  Metalink/XML Files

Full Metalink/XML files for a given resource can be specified as shown in Section 4 (Peer-to-Peer / Metainfo). This is particularly useful for providing metadata such as cryptographic hashes of parts of a file, allowing a client to recover from partial errors (see Section 7.1.2 (Error Correction)).



 TOC 

5.  OpenPGP Signatures

OpenPGP signatures [RFC3156] (Elkins, M., Del Torto, D., Levien, R., and T. Roessler, “MIME Security with OpenPGP,” August 2001.) are specified with the Link header fields [RFC5988] (Nottingham, M., “Web Linking,” October 2010.) and a relation type of "describedby" and a type parameter of "application/pgp-signature".

A brief Metalink server response with OpenPGP signature only:

Link: <http://example.com/example.ext.asc>; rel=describedby;
type="application/pgp-signature"

Metalink clients MAY support the use of OpenPGP signatures.



 TOC 

6.  Cryptographic Hashes of Whole Files

Metalink servers MUST provide Instance Digests in HTTP [RFC3230] (Mogul, J. and A. Van Hoff, “Instance Digests in HTTP,” January 2002.) for files they describe with mirrors via Link header fields. Mirror servers SHOULD as well. If Instance Digests are not provided by the Metalink servers, the Link header fields MUST be ignored.

A brief Metalink server response with cryptographic hash:

Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==


 TOC 

7.  Client / Server Multi-source Download Interaction

Metalink clients begin a download with a standard HTTP [RFC2616] (Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” June 1999.) GET request to the Metalink server. A Range limit is optional, not required. Alternatively, Metalink clients can begin with a HEAD request to the Metalink server to discover mirrors via Link header fieldss. After that, the client follows with a GET request to the desired mirrors.

GET /distribution/example.ext HTTP/1.1
Host: www.example.com

The Metalink server responds with the data and these header fields:

HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 14867603
Content-Type: application/x-cd-image
Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
Link: <http://www2.example.com/example.ext>; rel=duplicate; pref
Link: <ftp://ftp.example.com/example.ext>; rel=duplicate
Link: <http://example.com/example.ext.torrent>; rel=describedby;
type="application/x-bittorrent"
Link: <http://example.com/example.ext.metalink>; rel=describedby;
type="application/metalink4+xml"
Link: <http://example.com/example.ext.asc>; rel=describedby;
type="application/pgp-signature"
Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==

From the Metalink server response the client learns some or all of the following metadata about the requested object, in addition to also starting to receive the object:

(Alternatively, the client could have requested a HEAD only, and then skipped to making the following decisions on every available mirror server found via the Link header fieldss)

If the object is large and gets delivered slower than expected then the Metalink client starts a number of parallel ranged downloads (one per selected mirror server other than the first) using mirrors provided by the Link header fields with "duplicate" relation type, using the location of the original GET request in the "Referer" header field. The size and number of ranges requested from each server is for the client to decide, based upon the performance observed from each server. Further discussion of performance considerations is presented in Section 8 (Multi-server Performance).

If no range limit was given in the original request then work from the tail of the object (the first request is still running and will eventually catch up), otherwise continue after the range requested in the first request. If no Range was provided, the original connection must be terminated once all parts of the resource have been retrieved. It is recommended that a HEAD request is undertaken first, so that the client can find out if there are any Link header fieldss, and then Range-based requests are undertaken to the mirror servers as well as on the original connection.

Preferred mirrors have coordinated ETags, as described in Section 3.3 (Coordinated Mirror Policies), and If-Match conditions based on the ETag SHOULD be used to quickly detect out-of-date mirrors by using the ETag from the Metalink server response. If no indication of ETag syncronisation/knowledge is given then If-Match should not be used, and optimally there will be an Instance Digest in the mirror response which we can use to detect a mismatch early, and if not then a mismatch won't be detected until the completed object is verified. Early file mismatch detection is described in detail in Section 7.1.1 (Error Prevention (Early File Mismatch Detection)).

One of the client requests to a mirror server:

GET /example.ext HTTP/1.1
Host: www2.example.com
Range: bytes=7433802-
If-Match: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
Referer: http://www.example.com/distribution/example.ext

The mirror servers respond with a 206 Partial Content HTTP status code and appropriate "Content-Length" and "Content Range" header fields. The mirror server response, with data, to the above request:

HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 7433801
Content-Range: bytes 7433802-14867602/14867603
Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==

If the first request was not Range limited then abort it by closing the connection when it catches up with the other parallel downloads of the same object.

Downloads from mirrors that do not have the same file size as the Metalink server are considered unusable and the client can deal with it as it sees fit.

If a Metalink client does not support certain download methods (such as FTP or BitTorrent) that a file is available from, and there are no available download methods that the client supports, then the download will have no way to complete.

Once the download has completed, the Metalink client MUST verify the cryptographic hash of the file. If the cryptographic hash offered by the Metalink server with Instance Digests does not match the cryptographic hash of the downloaded file, see Section 7.1.2 (Error Correction) for a possible way to repair errors.

If the download can not be repaired, it is considered corrupt. The client can attempt to re-download the file.



 TOC 

7.1.  Error Prevention, Detection, and Correction

Error prevention, or early file mismatch detection, is possible before file transfers with the use of file sizes, ETags, and cryptographic hashes. Error detection requires Instance Digests, or cryptographic hashes, to determine after transfers if there has been an error. Error correction, or download repair, is possible with partial file cryptographic hashes.

Note that cyptographic hashes obtained from Instance Digests are in base64 encoding, while those from Metalink/XML and FTP HASH are in hexadecimal.



 TOC 

7.1.1.  Error Prevention (Early File Mismatch Detection)

In HTTP terms, the requirement is that merging of ranges from multiple responses must be verified with a strong validator, which in this context is the same as either Instance Digest or a strong ETag. In most cases it is sufficient that the Metalink server provides mirrors and Instance Digest information, but operation will be more robust and efficient if the mirror servers do implement a synchronized ETag as well. In fact, the emitted ETag can be implemented the same as the Instance Digest for simplicity, but there is no need to specify how the ETag is generated, just that it needs to be shared among the mirror servers. If the mirror server provides neither synchronized ETag or Instance Digest, then early detection of mismatches is not possible unless file length also differs. Finally, the error is still detectable, after the download has completed, when the merged response is verified.

ETags can not be used for verifying the integrity of the received content. But it is a guarantee issued by the Metalink server that the content is correct for that ETag. And if the ETag given by the mirror server matches the ETag given by the master server, then we have a chain of trust where the master server authorizes these responses as valid for that object.

This guarantees that a mismatch will be detected by using only the synchronized ETag from a master server and mirror server, even alerted by the mirror servers themselves by responding with an error, preventing accidental merges of ranges from different versions of files with the same name. This even includes many malicious attacks where the data on the mirror has been replaced by some other file, but not all.

Synchronized ETag can not strictly protect against malicious attacks or server or network errors replacing content, but neither can Instance Digest on the mirror servers as the attacker most certainly can make the server seemingly respond with the expected Instance Digest even if the file contents have been modified, just as he can with ETag, and the same for various system failures also causing bad data to be returned. The Metalink client has to rely on the Instance Digest returned by the Metalink master server in the first response for the verification of the downloaded object as a whole.

If the mirror servers do return an Instance Digest, then that is a bonus, just as having them return the right set of Link header fieldss is. The set of trusted mirrors doing that can be substituted as master servers accepting the initial request if one likes.

The benefit of having slave mirror servers (those not trusted as masters) return Instance Digest is that the client then can detect mismatches early even if ETag is not used. Both ETag and slave mirror Instance Digest do provide value, but just one is sufficient for early detection of mismatches. If none is provided then early detection of mismatches is not possible unless the file length also differs, but the error is still detected when the merged response is verified.

If FTP servers support the FTP HASH command [draft‑ietf‑ftpext2‑hash] (Bryan, A., Kosse, T., and D. Stenberg, “FTP Extensions for Cryptographic Hashes,” November 2010.) and the same hash algorithm as the originating Metalink server, then that information can be used for early file mismatch detection.



 TOC 

7.1.2.  Error Correction

Partial file cryptographic hashes can be used to detect errors during the download. Metalink servers are not required to offer partial file cryptographic hashes in Metalink/XML as specified in Section 4.1 (Metalink/XML Files), but they are encouraged to do so.

If the object cryptographic hash does not match the Instance Digest then fetch the Metalink/XML if available, where partial file cryptographic hashes can be found, allowing detection of which server returned incorrect data. If the Instance Digest computation does not match then the client needs to fetch the partial file cryptographic hashes, if available, and from there figure out what of the downloaded data can be recovered and what needs to be fetched again. If no partial cryptographic hashes are available, then the client MUST fetch the complete object from other mirrors.



 TOC 

8.  Multi-server Performance

When opting to download simultaneously from multiple mirrors, there are a number of factors (both within and outside the influence of the client software) that are relevant to the performance achieved:

Obviously we do not want to use too many simultaneous connections, or other traffic sharing a bottleneck link will be starved. But at the same time, good performance requires that the client can simultaneously download from at least one fast mirror while exploring whether any other mirror is faster. Based on laboratory experiments, we suggest a good default number of simultaneous connections is probably four, with three of these being used for the best three mirrors found so far, and one being used to evaluate whether any other mirror might offer better performance.

The size of chunks chosen by the client should be sufficiently large that the chunk request header fields and reponse header fields represent neglible overhead, and sufficiently large that they can be pipelined effectively without needing a very high rate of chunk requests. At the same time, the amount of time wasted waiting for the last chunk to download from the last server after all the other servers have finished should be minimized. Note that Range requests impose an overhead on servers and clients need to be aware of that and not abuse them.



 TOC 

9.  IANA Considerations

Accordingly, IANA will make the following registration to the Link Relation Type registry.

o Relation Name: duplicate

o Description: Refers to a resource whose available representations are byte-for-byte identical with the corresponding representations of the context IRI.

o Reference: This specification.

o Notes: This relation is for static resources. That is, an HTTP GET request on any duplicate will return the same representation. It does not make sense for dynamic or POSTable resources and should not be used for them.



 TOC 

10.  Security Considerations



 TOC 

10.1.  URIs and IRIs

Metalink clients handle URIs and IRIs. See Section 7 of [RFC3986] (Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” January 2005.) and Section 8 of [RFC3987] (Duerst, M. and M. Suignard, “Internationalized Resource Identifiers (IRIs),” January 2005.) for security considerations related to their handling and use.



 TOC 

10.2.  Spoofing

There is potential for spoofing attacks where the attacker publishes Metalinks with false information. In that case, this could deceive unaware downloaders that they are downloading a malicious or worthless file. Also, malicious publishers could attempt a distributed denial of service attack by inserting unrelated URIs into Metalinks.



 TOC 

10.3.  Cryptographic Hashes

Currently, some of the digest values defined in Instance Digests in HTTP [RFC3230] (Mogul, J. and A. Van Hoff, “Instance Digests in HTTP,” January 2002.) are considered insecure. These include the whole Message Digest family of algorithms which are not suitable for cryptographically strong verification. Malicious people could provide files that appear to be identical to another file because of a collision, i.e. the weak cryptographic hashes of the intended file and a substituted malicious file could match.

If a Metalink contains whole file hashes as described in Section 6 (Cryptographic Hashes of Whole Files), it SHOULD include SHA-256, as specified in [FIPS‑180‑3] (National Institute of Standards and Technology (NIST), “Secure Hash Standard (SHS),” October 2008.), or stronger. It MAY also include other hashes.



 TOC 

10.4.  Signing

Metalinks should include digital signatures, as described in Section 5 (OpenPGP Signatures).

Digital signatures provide authentication, message integrity, and non-repudiation with proof of origin.



 TOC 

11.  References



 TOC 

11.1. Normative References

[BITTORRENT] Cohen, B., “The BitTorrent Protocol Specification,” BITTORRENT 11031, February 2008.
[FIPS-180-3] National Institute of Standards and Technology (NIST), “Secure Hash Standard (SHS),” FIPS PUB 180-3, October 2008.
[ISO3166-1] International Organization for Standardization, “ISO 3166-1:2006. Codes for the representation of names of countries and their subdivisions -- Part 1: Country codes,” November 2006.
[RFC0959] Postel, J. and J. Reynolds, “File Transfer Protocol,” STD 9, RFC 0959, October 1985.
[RFC2119] Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997.
[RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, “Hypertext Transfer Protocol -- HTTP/1.1,” RFC 2616, June 1999.
[RFC3156] Elkins, M., Del Torto, D., Levien, R., and T. Roessler, “MIME Security with OpenPGP,” RFC 3156, August 2001 (TXT).
[RFC3230] Mogul, J. and A. Van Hoff, “Instance Digests in HTTP,” RFC 3230, January 2002.
[RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, “Uniform Resource Identifier (URI): Generic Syntax,” STD 66, RFC 3986, January 2005.
[RFC3987] Duerst, M. and M. Suignard, “Internationalized Resource Identifiers (IRIs),” RFC 3987, January 2005.
[RFC5854] Bryan, A., Tsujikawa, T., McNab, N., and P. Poeml, “The Metalink Download Description Format,” RFC 5854, June 2010.
[RFC5988] Nottingham, M., “Web Linking,” RFC 5988, October 2010.
[draft-ietf-ftpext2-hash] Bryan, A., Kosse, T., and D. Stenberg, “FTP Extensions for Cryptographic Hashes,” draft-ietf-ftpext2-hash-00 (work in progress), November 2010.


 TOC 

11.2. Informative References

[RFC5843] Bryan, A., “Additional Hash Algorithms for HTTP Instance Digests,” RFC 5843, April 2010 (TXT).


 TOC 

Appendix A.  Acknowledgements and Contributors

Thanks to the Metalink community, Alexey Melnikov, Julian Reschke, Mark Nottingham, Daniel Stenberg, Matt Domsch, Micah Cowan, and David Morris.

Mark Handley and Javier Vela Diago did work on simultaneous download from multiple mirrors, which also provided validation of the benefits of this approach.



 TOC 

Appendix B.  Comparisons to Similar Options

[[ to be removed by the RFC editor before publication as an RFC. ]]

This draft, compared to the Metalink/XML format [RFC5854] (Bryan, A., Tsujikawa, T., McNab, N., and P. Poeml, “The Metalink Download Description Format,” June 2010.) :



 TOC 

Appendix C.  Document History

[[ to be removed by the RFC editor before publication as an RFC. ]]

Known issues concerning this draft:

-19 : January 20, 2011.

-18 : January 1, 2010.

-17 : September 13, 2010.

-16 : April 16, 2010.

-15 : February 20, 2010.

-14 : December 31, 2009.

-13 : November 22, 2009.

-12 : November 11, 2009.

-11 : October 23, 2009.

-10 : October 15, 2009.

-09 : October 13, 2009.

-08 : October 4, 2009.

-07 : September 29, 2009.

-06 : September 24, 2009.

-05 : September 19, 2009.

-04 : September 17, 2009.

-03 : September 16, 2009.

-02 : September 7, 2009.

-01 : September 1, 2009.

-00 : August 24, 2009.



 TOC 

Authors' Addresses

  Anthony Bryan
  Pompano Beach, FL
  USA
Email:  anthonybryan@gmail.com
URI:  http://www.metalinker.org
  
  Neil McNab
Email:  neil@nabber.org
URI:  http://www.nabber.org
  
  Henrik Nordstrom
Email:  henrik@henriknordstrom.net
URI:  http://www.henriknordstrom.net/
  
  Tatsuhiro Tsujikawa
  Shiga
  Japan
Email:  tatsuhiro.t@gmail.com
URI:  http://aria2.sourceforge.net
  
  Dr. med. Peter Poeml
  MirrorBrain
  Venloer Str. 317
  Koeln 50823
  DE
Phone:  +49 221 6778 333 8
Email:  peter@poeml.de
URI:  http://mirrorbrain.org/~poeml/
  
  Alan Ford
  Roke Manor Research
  Old Salisbury Lane
  Romsey, Hampshire SO51 0ZN
  UK
Phone:  +44 1794 833 465
Email:  alan.ford@roke.co.uk