Internet Engineering Task Force E. Zierau, Ed. Internet-Draft Royal Danish Library Intended status: Informational May 2, 2019 Expires: November 3, 2019 A Persistent Web IDentifier (PWID) URN Namespace draft-pwid-urn-specification-07 Abstract This document specifies a Uniform Resource Name (URN) for Persistent Web IDentifiers for web material in web archives using the 'pwid' namespace identifier. The main purpose of the standard is to support specification of references that are not covered by other reference techniques: to support references to material in web archives with restricted access. Furthermore, it supports persistent technology agnostic references to web archives in general, in a form that can work as an algorithmic basis for finding web archive resources in general. An additional important benefit is that the standard can be used for specifying web collections, which can then form a persistent computational basis for the extract of the archived collection parts. Since these parts can be specified generally, this further allows collections to be specified with elements from one or more web archives. The PWID URN is designed to meet requirements for proper referencing needed by researchers. Therefore, it is designed as general, global, sustainable, humanly readable, technology agnostic, persistent and precise web references for web materials in web archives. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Zierau Expires November 3, 2019 [Page 1] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 This Internet-Draft will expire on November 3, 2019. Copyright Notice Copyright (c) 2019 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 2. Namespace Registration Template . . . . . . . . . . . . . . . 6 3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 22 4. References . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1. Normative References . . . . . . . . . . . . . . . . . . 22 4.2. Informative References . . . . . . . . . . . . . . . . . 22 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 25 1. Introduction The PWID URN is a supplement to existing reference standards, where the PWID URN will support references to web archives, including areas that are not supported today: support of references to material in web archives with restricted access. Furthermore, the PWID URN enables technology agnostic references to web archives in general, which can be needed, for instance for references to dynamic web material with frequent updates (e.g. a news site) or a specific version of a web material (e.g. specific version of the DOI handbook). The PWID URN is in a form which can work as an algorithmic basis for finding the resource. This also enables computation of archived web parts to a collection from one or more web archives, if the collection parts are specified by PWID URNs. Furthermore, the PWID URN includes information about the resource which makes it possible to find alternative resources, in cases where the original precise resource has become unavailable. Zierau Expires November 3, 2019 [Page 2] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 The PWID URN is designed to be a persistent reference that is general, global and technology agnostic in order to enhance its chances of being sustainable. Furthermore, it is designed to be humanly readable and with an ability to specify precision about what the referenced web archive resource covers. This design enables a PWID URN to: o be used in technical solutions, e.g. to make them resolvable o cover references to all sorts of materials in web archives o cover references to materials from all sorts of web archives The motivation for defining a PWID namespace is the growing challenges of references to archived web resources, and the PWID as a URN can assist in overcoming a lot of these challenges. The standard is needed to address web materials meeting precision and persistency issues on par with precision in traditional references for analogue material. Furthermore, it is needed in order to address web archive resources that are not freely available online. The PWID URN covers both referencing of web resources from research papers and definition of web collections/corpora. In detail the challenges are: o Persistent Identifier systems (like DOI [DOI]) will only cover registered resources. In general, citation guidelines do not cover general and persistent referencing techniques for web resources that are not registered. However, an increasing number of references point to resources that only exist on the web, e.g. blogs that turn out to have a historical impact. In order to obtain persistency for a reference, the target needs to be stable. For non-registered web resources, the common rule is that the resource will change, since the live-web is constantly changing. Persistency can only be obtained by referring to something stable, i.e. an archived snapshot of the resource from the web. The PWID URN is therefore focused on referencing archived web material in a technology agnostic way (research documented in [IPRES2016] and [ResawRef]). o References to materials, which only exist in web archives (i.e. no longer on the live web) are not well supported, especially not for materials that only exists in archives with restricted access. There are many new initiatives for web archive referencing, - most of which are centralized solutions offering harvesting and referencing, but these cannot be used for materials that only exist in web archives. The PWID URN can be used for all web archives, including web archives with restricted access. Zierau Expires November 3, 2019 [Page 3] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 o One of the referencing initiatives for open web archives uses URLs which depend on the current setup of the web archive's access platform. These URLs are usually technology and placement dependent, and therefore such a reference style is not suited for references that are important to retrace for a long period. The PWID URN can be used for such reference purposes, since it is technology agnostic. o Another referencing initiative, for open web archives, is omitting specification of the web archive where the resource was found. This strategy is used in order to open the possibility of using alternatives from other archives. However, this also adds a risk of imprecision since different archives tend to have different versions even when harvesting at the same time. Therefore, such a reference style is not suited for references where it is important that the reference is precisely the verified reference. The PWID URN can provide an exact reference for where the reference was validated. Additionally, the PWID contains the needed information in order to search for alternative resource, if needed. o For reference of web collections/corpora (possibly across different web archives), recent research have found that various legal and sustainability issues has led to a need of a collection definition of references to their web parts. Furthermore, there is a need for a similar persistent referencing for all parts for calculation and sustainability reasons. So far, there has been no stable standard for definition of such collection parts. The PWID URN can be used for such definitions in order to fulfil these requirements (research documented in [ResawColl]). The PWID URN is especially useful for web material where precision is in focus and/or there are references to materials from web archives requiring special permissions in order to gain access. The precision regards both pointing to the archive where it was found and validated against its purpose (other archived versions in other web archives may differ both regarding completeness and contents even within short time periods) as well as precision in what is actually referred by the reference (e.g. is it the page or the whole website). Furthermore, the PWID URN is very useful in specification of contents of a web collection. Definitions of web collections are often needed for extraction of data used in production of research results, e.g. for future evaluations. Current practices are not persistent as they often use some CDX version, which vary for different implementations. Strict syntax is needed for the PWID URN, in order to ensure that it can act as a reference which can used for computational purposes. This is especially relevant for automatic extraction of parts from Zierau Expires November 3, 2019 [Page 4] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 web collection definitions. Furthermore, today's readers of research papers are expecting to be able to access a referenced resource by clicking an actionable URI, therefore a similar possibility will be expected for references to available archived web material, and this is possible with a strict syntax. Examples of technical solutions that are enabled are: o Resolving of a reference to a web collection and automatic extraction of the parts of a web collection defined by PWID URNs [ResawRef] [ResawColl] o Resolving of a PWID URN by resolving services. To begin with, a prototype has been developed for the Danish web archive data and open web archives with standard patterns for the current technologies. Implementations for resolution of PWID URNs for other web archives may be developed. The purpose of the PWID URN is also to express a web archive reference as simple as possible and at the same time meet the requirements for sustainability, usability and scope. Therefore, the PWID URN is focused on having only the minimum required information to make a precise identification of a resource in an arbitrary web archive. Recent research have shown that this can be obtained by the following information [ResawRef]: o Identification of web archive o Identification of source: * Archived URI or identifier * Archival timestamp o Intended precision (page, part, subsite etc.) The PWID URN represents this information in a human readable way as well as a well-defined way that enables technical solutions to interpret the URN. 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Zierau Expires November 3, 2019 [Page 5] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 2. Namespace Registration Template Namespace Identifier: PWID Version: 1 Date: 2019-05-02 Registrant: Eld Maj-Britt Olmuetz Zierau Royal Danish Library Soeren Kierkegaards Plads 1 1219 Copenhagen Denmark ph: +45 9132 4690 email: elzi@kb.dk Purpose: The PWID URN is a supplement to existing reference standards, where the PWID URN will support references to web archives, including areas that are not supported today: support of references to material in web archives with restricted access. Furthermore, the PWID URN enables technology agnostic references to web archives in general, which can be needed, for instance for references to dynamic web material with frequent updates (e.g. a news site) or a specific version of a web material (e.g. specific version of the DOI handbook). The PWID URN is in a form which can work as an algorithmic basis for finding the resource. This also enables computation of archived web parts to a collection from one or more web archives, if the collection parts are specified by PWID URNs. Furthermore, the PWID URN includes information about the resource which makes it possible to find alternative resources, in cases where the original precise resource has become unavailable. The PWID URN is designed to be a persistent reference that is general, global and technology agnostic in order to enhance its Zierau Expires November 3, 2019 [Page 6] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 chances of being sustainable. Furthermore, it is designed to be humanly readable and with an ability to specify precision about what the referenced web archive resource covers. This design enables a PWID URN to: * be used in technical solutions, e.g. to make them resolvable * cover references to all sorts of materials in web archives * cover references to materials from all sorts of web archives The motivation for defining a PWID namespace is the growing challenges of references to archived web resources, and the PWID as a URN can assist in overcoming a lot of these challenges. The standard is needed to address web materials meeting precision and persistency issues on par with precision in traditional references for analogue material. Furthermore, it is needed in order to address web archive resources that are not freely available online. The PWID URN covers both referencing of web resources from research papers and definition of web collections/corpora. In detail the challenges are: * Persistent Identifier systems (like DOI [DOI]) will only cover registered resources. In general, citation guidelines do not cover general and persistent referencing techniques for web resources that are not registered. However, an increasing number of references point to resources that only exist on the web, e.g. blogs that turn out to have a historical impact. In order to obtain persistency for a reference, the target needs to be stable. For non-registered web resources, the common rule is that the resource will change, since the live-web is constantly changing. Persistency can only be obtained by referring to something stable, i.e. an archived snapshot of the resource from the web. The PWID URN is therefore focused on referencing archived web material in a technology agnostic way (research documented in [IPRES2016] and [ResawRef]). * References to materials, which only exist in web archives (i.e. no longer on the live web) are not well supported, especially not for materials that only exists in archives with restricted access. There are many new initiatives for web archive referencing, - most of which are centralized solutions offering harvesting and referencing, but these cannot be used for materials that only exist in web archives. The PWID URN can be used for all web archives, including web archives with restricted access. Zierau Expires November 3, 2019 [Page 7] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 * One of the referencing initiatives for open web archives uses URLs which depend on the current setup of the web archive's access platform. These URLs are usually technology and placement dependent, and therefore such a reference style is not suited for references that are important to retrace for a long period. The PWID URN can be used for such reference purposes, since it is technology agnostic. * Another referencing initiative, for open web archives, is omitting specification of the web archive where the resource was found. This strategy is used in order to open the possibility of using alternatives from other archives. However, this also adds a risk of imprecision since different archives tend to have different versions even when harvesting at the same time. Therefore, such a reference style is not suited for references where it is important that the reference is precisely the verified reference. The PWID URN can provide an exact reference for where the reference was validated. Additionally, the PWID contains the needed information in order to search for alternative resource, if needed. * For reference of web collections/corpora (possibly across different web archives), recent research have found that various legal and sustainability issues has led to a need of a collection definition of references to their web parts. Furthermore, there is a need for a similar persistent referencing for all parts for calculation and sustainability reasons. So far, there has been no stable standard for definition of such collection parts. The PWID URN can be used for such definitions in order to fulfil these requirements (research documented in [ResawColl]). The PWID URN is especially useful for web material where precision is in focus and/or there are references to materials from web archives requiring special permissions in order to gain access. The precision regards both pointing to the archive where it was found and validated against its purpose (other archived versions in other web archives may differ both regarding completeness and contents even within short time periods) as well as precision in what is actually referred by the reference (e.g. is it the page or the whole website). Furthermore, the PWID URN is very useful in specification of contents of a web collection. Definitions of web collections are often needed for extraction of data used in production of research results, e.g. for future evaluations. Current practices are not persistent as they often use some CDX version, which vary for different implementations. Zierau Expires November 3, 2019 [Page 8] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 Strict syntax is needed for the PWID URN, in order to ensure that it can act as a reference which can used for computational purposes. This is especially relevant for automatic extraction of parts from web collection definitions. Furthermore, today's readers of research papers are expecting to be able to access a referenced resource by clicking an actionable URI, therefore a similar possibility will be expected for references to available archived web material, and this is possible with a strict syntax. Examples of technical solutions that are enabled are: * Resolving of a reference to a web collection and automatic extraction of the parts of a web collection defined by PWID URNs [ResawRef] [ResawColl] * Resolving of a PWID URN by resolving services. To begin with, a prototype has been developed for the Danish web archive data and open web archives with standard patterns for the current technologies. Implementations for resolution of PWID URNs for other web archives may be developed. The purpose of the PWID URN is also to express a web archive reference as simple as possible and at the same time meet the requirements for sustainability, usability and scope. Therefore, the PWID URN is focused on having only the minimum required information to make a precise identification of a resource in an arbitrary web archive. Recent research have shown that this can be obtained by the following information [ResawRef]: * Identification of web archive * Identification of source: + Archived URI or identifier + Archival timestamp * Intended precision (page, part, subsite etc.) The PWID URN represents this information in a human readable way as well as a well-defined way that enables technical solutions to interpret the URN. Syntax: The syntax of the PWID URN is specified below in Augmented Backus- Naur Form (ABNF) [RFC5234] and conforms to URN syntax defined in [RFC8141]. The syntax definition of the PWID URN is: Zierau Expires November 3, 2019 [Page 9] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 pwid-urn = "urn:" pwid-NID ":" pwid-NSS pwid-NID = "pwid" pwid-NSS = archive-id ":" archival-time ":" precision-spec ":" archived-item-id archive-id = domain / ( "~" registered-archive-id ) registered-archive-id = 1*unreserved archival-time = utc-date ["T" utc-time] "Z" utc-date = utc-year "-" utc-month "-" utc-day utc-year = 4DIGIT utc-month = 2DIGIT ; 01-12 utc-day = 2DIGIT ; 01-28, 01-29, 01-30, 01-31 based on ; month/year in UTC time utc-time = utc-hour ":" utc-minute [":" utc-second [secfrac]] utc-hour = 2DIGIT ; 00-23 utc-minute = 2DIGIT ; 00-59 utc-second = 2DIGIT ; 00-58, 00-59, 00-60 based on leap second ; rules secfrac = "." 1*9DIGIT precision-spec = "part" / "page" / "subsite" / "site" / "collection" / "recording" / "snapshot" / extension extension = 1*ALPHA archived-item-id = uri-string / ( "~" registered-item-id ) registered-item-id = 1*unreserved where * All parts of the pwid-NSS are case insensitive, except for archived-item-id in cases where the archived-item-id is an URI with case sensitive parts. According to [RFC8141] (section 3.1) this means that the PWID URNs in general are case insensitive, except from cases where it includes a case sensitive URI as archived-item-id. * 'DIGIT' is defined as in [RFC5234]. * 'ALPHA' is defined as in [RFC5234]. * 'unreserved' is defined as in [RFC3986]. * 'domain' is defined as in (section section 3.5) [RFC1034]. Zierau Expires November 3, 2019 [Page 10] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 * 'uri-string' is defined as 'URI' in [RFC3986] but where occurrences of "[", "]", "?", "#" and "%" are %-encoded in order not to clash with URN reserved characters [RFC8141] as well as having unambiguous use of "%". * 'archive-id' must either be the domain for the archive which can lead to descriptions of how to access (or apply for access) materials in he archive, - or it must be a registered archive- id (registry still to be defined and created). Distinction between the to types of identifiers is made by matching the first character with "~". In case of a match, it means that the rest of the identifier is a registered archive item identifiers, since the syntax requires such identifiers to be prefixed with "~", while no URI is allowed to start with this character * 'archival-time' is a UTC timestamp which conforms to the W3C profile of [ISO8601] [W3CDTF] and a subset of date-time specified in [RFC3339] (except from allowing partial time specification). The archival-time may be specified at any of the levels of granularity, as long as it reflects exactly the granularity of the timestamp recorded in the archive (which is in accordance with the WARC standard [ISO28500]). * 'archive-item-id' must either be the archived URI for the source or a registered archive-item-id. Distinction between the to types of identifiers is made by matching the first character with "~". In case of a match, it means that the rest of the identifier is a registered archive item identifiers, since the syntax requires such identifiers to be prefixed with "~", while no URI is allowed to start with this character The precision specification is expressing the intended precision of the reference, which is needed for specification of * precise coverage of the reference e.g. to an html file, since the precise meaning of what the reference covers can be very varied (the html file itself?, the web page it renders to? a subsite that it represents? etc.) or a collection of web parts with precise sepcification of which web parts that are included. * degree of how precise the reference is with respect to what can be viewed in the future the html file itself will probably be the same and a collection specification is also quite precise (see also example of a collection is provided in the section about assignment). However, for web pages and websites there are interpretation Zierau Expires November 3, 2019 [Page 11] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 involved, which mean the result of rendering them in the web archive can change over time. This may happen in case the web archive's algorithm for calculation of the referred web parts or additional parts have been added which will be picked by the algorithm. * how resolvers should display the referred source in order to correspond to the precision specification if it is an html file it can e.g. be as a text file if the preceision specification is 'part', via Wayback if the preceision specification is "webpage", choosing a snapshot version of the page if the the preceision specification is 'snapshot' etc. If access is limited to the referenced part (the html page), then the application would also need to make sure that all parts/pages belonging to the site/subsite is available. The following valid precision-spec values are exists: * 'part' Meaning the single archived file/web part harvested as from the specified URI. In case the URI is for a file for a web page. For refences to web pages with html code (e.g. html or asp file) this will mean the actual file with the html code. It is relevant to refer to web pages this way, in case it is part of a collection specification or in case it is the html that is of interest (e.g. javascripts or hidden links which are not visible when rendering the web page). For all other types of files the URI will be for single files to be interpreted a file. The precision-spec will always be 'part' for such single non-web related files. * 'page' Meaning that an application like Wayback calculates a resulting web page based on calculated referenced web parts (display templates, images etc.). For example an html page displaying an image will need both the html and the referred image. If it is important for the reference to be sure that it is the same image, the most a precise reference to a picture in context of a web page would be to provide the PWID URN for the page (with webpage precision) and the PWID URN for the image file part which contains the referred picture (with part precision) * 'subsite' Meaning the subsite defined by the referred web page (as described under 'page') and underlying web pages (referred by the page) that have URIs starting with the same path as the Zierau Expires November 3, 2019 [Page 12] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 archived URI. Calculatiuon of all the relevant parts for the subsite are calculated by an application like Wayback (similarly to what is described under 'page'). * 'site' Meaning the site defined by the referred web page (as described under 'page') and underlying web pages (referred by the page) that have URIs starting with the same path as the archived URI. Calculatiuon of all the relevant parts for the site are calculated by an application like Wayback (similarly to what is described under 'page'). * 'collection' Meaning the collection which is defined in a specification which can be identified by an identifier (e.g. collection specification in the XML format enabling interpretation as in the example provided in [ResawColl]) * 'snapshot' Meaning a snapshot (image) representation of web material, e.g. a web page * 'recording' Meaning a representation of a web recording specification where the web archive applications will decide how it is rendered (interpretation could e.g. depend on file-suffix for the web recording), an example is a web recording coded in a WARC file The option of making an extension value is included to allow reference of a resource of any kind with an assigned identifier, even if it is not covered by the other values. In all cases, it will be up to the application serving the web archive to interpret how this item should be rendered. Assignment: The PWID URNs do not have to be assigned by an authority, as they are based on the information created at the time of archiving. In other words: a PWID URN is created independently, but following an algorithm which ensures that the referred item can be found if it is still available. A PWID URN also has the benefit that it includes information to look at alternative resources e.g. via Memento for some open web archives [MEMENTO] or via possible future web archive infrastructures. A PWID URN is created by finding the relevant information of the syntax parts of the PWID: Zierau Expires November 3, 2019 [Page 13] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 "urn:pwid:" archive-id ":" archival-time ":" precision-spec ":" archived-item-id The PWID URN for an archived item at hand can be constructed by exchanging the unspecified PWID parts with relevant information, as explained in the following: * archive-id (identification of web archive): In this version of the standard, it is recommended to use the domain of the web archive as the identifier for the web archive (e.g. archive.org for Internet Archive's open web archive and netarkivet.dk for the Danish web archive with restricted access). This is recommended, since browsing the domain page will typically lead to a description of how to access the web archive, e.g. by online access or by applying for access grants. Furthermore, it is more precise than e.g. the name of the archive, since there may be more than one installation of web archives at the same organization, e.g. archive.org and archive-it.org are both covered by Internet Archive. When a registry of web archives is established, it will be more precise and persistent to use the web archive identifier specified in this registry. (e.g. DKWA for the Danish web archive with the domain netarkivet.dk). The syntax requires that such identifiers are prefixed with the character "~". * archival-time (archival timestamp): The archival time for the archived item must be specified with as much granularity as possible in order to make sure it uniquely identifies the resource at hand. The archival time may be displayed along with the archived item, but there are different implementations where it is important to be aware of whether a more precise timestamp can be found, and whether the correct timestamp is used. In many Wayback implementations, the precise timestamp can be found as part of the URI used for viewing the archived item. For example, the archive http URI https://web.archive.org/web/20160122112029/http://www.dr.dk for an archived resource viewable via the Internet Archive's Wayback installation, the number 20160122112029 represents the archival time 2016-01-22T11:20:29Z. In other installations, the most precise timestamp may be found in the URI from a search result leading to the resource (which usually redirects on basis of a call to the underlying archive index). Especially for web pages with frames, there may be cases where the actual time is not displayed with the source, since only the times for the contents of the frames are displayed. * precision-spec (precision as represented page, part, site, snapshot etc.): Zierau Expires November 3, 2019 [Page 14] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 The precision specification specifies how the user should view the referred item - either as a specific representation (with inherited precision) or by use of tools (e.g. browse web site based on calculations or browse on basis of collection of specific parts). Inherited precision is implicitly indicated by the precision specification from how the information is used in resolution and location. The most precis reference is part, e.g. for an image which can be located and accessed independently. Less precise references are references where calculation of other parts are needed in order to resolve and view it, e.g. page, site or subsite. * archived-item-id (archived URI or registered identifier): The archived item identifier will either be the archived URI of the displayed archived item at hand, or it will be an identifier assigned for a resource by the archive. In the latter case, the syntax requires that such identifiers are prefixed with the character "~". A much easier way to construct PWID URNs is to use tools that construct them. Currently, there is also a prototype for a SOLR- Wayback tool (Source at https://github.com/netarchivesuite/ solrwayback) [PWIDprovider], which can assist in finding the most precise reference to an archived web page. This Wayback version can provide all PWID URNs belonging to a shown page (with the page PWID URN at the top). For example, in netarkivet.dk, the archived URI for the web page http://www.susanlegetoej.dk/shop/handskedyr- siameser-killing-8681p.html archived 2008-11-29 01:19:16 UTC, has the following parts calculated by the SOLR-Wayback tool: urn:pwid:netarkivet.dk:2008-11- 29T00:41:42Z:part:http://www.susanlegetoej.dk/images/ddcss/ SK113_Master_NF.css urn:pwid:netarkivet.dk:2008-11- 29T00:39:47Z:part:http://www.susanlegetoej.dk/shop/css/ print.css urn:pwid:netarkivet.dk:2008-11- 29T00:40:06Z:part:http://www.susanlegetoej.dk/images/ddcss/ SK113_Basket_NF.css urn:pwid:netarkivet.dk:2008-11- 29T00:40:00Z:part:http://www.susanlegetoej.dk/images/ddcss/ SK113_TopMenu_NF.css Zierau Expires November 3, 2019 [Page 15] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 urn:pwid:netarkivet.dk:2008-11- 29T00:40:00Z:part:http://www.susanlegetoej.dk/images/ddcss/ SK113_SearchPage_NF.css urn:pwid:netarkivet.dk:2008-11- 29T00:40:35Z:part:http://www.susanlegetoej.dk/images/ddcss/ SK113_Productmenu_NF.css urn:pwid:netarkivet.dk:2008-11- 29T00:40:22Z:part:http://www.susanlegetoej.dk/images/ddcss/ SK113_SpaceTop_NF.css urn:pwid:netarkivet.dk:2008-11- 29T00:40:24Z:part:http://www.susanlegetoej.dk/images/ddcss/ SK113_SpaceLeft_NF.css urn:pwid:netarkivet.dk:2008-11- 29T00:40:23Z:part:http://www.susanlegetoej.dk/images/ddcss/ SK113_SpaceBottom_NF.css urn:pwid:netarkivet.dk:2008-11- 29T00:40:25Z:part:http://www.susanlegetoej.dk/images/ddcss/ SK113_SpaceRight_NF.css urn:pwid:netarkivet.dk:2008-11- 29T00:37:23Z:part:http://www.susanlegetoej.dk/images/ddcss/ SK113_ProductInfo_NF.css urn:pwid:netarkivet.dk:2008-11- 29T00:37:24Z:part:http://www.susanlegetoej.dk/Shop/js/ Variants.js urn:pwid:netarkivet.dk:2009-03- 03T11:53:00Z:part:http://www.susanlegetoej.dk/Shop/js/Media.js urn:pwid:netarkivet.dk:2009-03- 03T11:53:02Z:part:http://www.susanlegetoej.dk/images/design/ print.gif urn:pwid:netarkivet.dk:2009-03- 03T11:54:19Z:part:http://www.susanlegetoej.dk/Shop/js/Scroll.js urn:pwid:netarkivet.dk:2009-03- 03T11:54:09Z:part:http://www.susanlegetoej.dk/Shop/js/ Shop5Common.js urn:pwid:netarkivet.dk:2006-11- 20T20:16:03Z:part:http://www.susanlegetoej.dk/images/602551.jpg Zierau Expires November 3, 2019 [Page 16] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 Security and Privacy: Security and privacy considerations are restricted to accessible web resources in web archives. Resolvers to PWID URNs will usually only be possible using the web archives' access tools, where security and privacy are covered by these tools. In such cases security and privacy will be as covered by these tools. It should be noted that an archived web page or part can be just as dangerous as a "live" page or part; for instance, it could include insecure scripts, malware, trackers, etc. Furthermore, an archived page can in fact be more dangerous, because it could include outdated scripts with known vulnerabilities that can never be patched because the script is archived for all time in a vulnerable state. Interoperability: This is covered by comments in the Syntax description: * the PWID URN conforms to the URI standard defined as in [RFC3986] and the URN standard [RFC8141] * the 'archival-time' of the PWID URN conforms to the UTC timestamp as described in the W3C profile of ISO 8601 [ISO8601] [W3CDTF] and is in accordance with the WARC standard ISO 28500 [ISO28500]. * for use of URIs for the 'archived-item-id', this URI conforms to the URI standard defined as in [RFC3986], with %-encodings of "[", "]", "#", "?" and "%" in order to conform to the URN standard [RFC8141] as well as having unambiguous use of "%" Resolution: The information in a PWID URN can be used for locating a web archive resource, for any kind of web archive. It includes the minimum information for web archive materials, which enables resolvability, manually or by a resolver. Resolution of a PWID URN is the primary motivation of making a formal URN definition, instead of just textual representation of the for needed parts of a PWID. Resolution (manually or automatically) is done based on the PWID parts: * Web archive identification for web archive holding referred resource Zierau Expires November 3, 2019 [Page 17] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 If the identifier do not start with "~", then the identifier is the domain name for the web archive, where browsing this domain page typically will lead to description of how to access the web archive. For example, "archive.org" is the domain name leading to the Internet Archive's interface to their online web collection, and "netarkivet.dk" is the domain name leading to the website for the Danish web archive with information about how to apply for access permission to the web collections. If the identifier starts with "~" the archive can be identified by looking up the identifier (from the rest of the archive identier) in a registry of archives. It is a future possibility is to have such a registry which should have archive identifiers along with their current location on the internet. Such a registry will be needed for persistent reference to the archive, since an archive may change their location and name or archives may merge. There is work in progress to define such a registry, but no details yet. * Archived URI or registered identifier of archived item If the identifier do not start with "~", then the resource is an archived URI, this URI must be used in search for or construction of location of the resource. If the identifier starts with "~", then the rest of the characters in the identifier constitutes a registered identifier assigned to the resource (by the archive), it is then this identifier that must be used in search for or construction of location of the resource * Date and time associated with the archived item The archival date and time must be used in search for or construction of the location of the resource * Precision of what is referred The precision can either contribute to the guidance of activating tools to view the referred item e.g. browse the referred item as a page on basis of computed closest past, browse the referred item on basis of parts specified in a collection, or view the referred item as a snapshot. In the example of the snapshot, it also contains a specification of which resource to display In the following the different resolution techniques are explained (manual as well as via a service) . An example of a PWID URN is: urn:pwid:archive.org:2016-01-22T11:20:29Z:page:http://www.dr.dk Zierau Expires November 3, 2019 [Page 18] Internet-DraAtPersistent Web IDentifier (PWID) URN Namespace May 2019 has the information: * archive.org Currently known identifier in form of the Internet Archive domain name for their open access web archive. If Internet Archive registered their open web archive in an IANA web archive register, this identifier could currently be "web.archive.org/web/" for Wayback resolution, or it could be "archive.org/pwid/" if a PWID interface was created as described below * 2016-01-22T11:20:29Z UTC date and time associated with the archived URI * page Clarification that the reference cover the full web page with all its inherited parts selected by the web archive * http://www.dr.dk archived URI of item Resolution of this URN PWID can be deduced based on the current (2019) knowledge of Internet Archive's open Wayback access web interface, which has the pattern: https://web.archive.org/web/