Memento Project

Memento is a United States National Digital Information Infrastructure and Preservation Program (NDIIPP)–funded project aimed at making Web-archived content more readily discoverable.

The Memento logo

The project is being led by the Los Alamos National Laboratory and Old Dominion University.

Rather than expecting people to know about the growing number of Web archives, and to guess which archive might hold an older version of the resource they’re looking for, Memento proposes to make archived content discoverable via the original URL that the searcher already knew about. Essentially, Memento is an attempt to permit users to view any web page as it looked on a given date in the past.

Technical description

A variety of web archives exist, collecting specific revisions of web pages as they existed at a particular point in time. Memento allows a user to seamlessly transition between these archives in search of the best archived page matching the datetime for the page that they desire.

Memento is defined in RFC 7089[1] as an implementation of the time dimension of content negotiation, as defined by Tim Berners Lee in 1996.[2] HTTP accomplishes negotiation of content via headers. The table below shows the different headers available for HTTP that allow clients and servers to find the content that the user desires.

Dimensions of Content Negotiation Provided by HTTP
Request Header Response Header Dimension Examples Reference
Accept Content-Type content-type of the representation text/html

text/plain image/png

RFC 7231[3]

RFC 2616

Accept-Language Content-Language language of the representation en

en-US cz

RFC 7231

RFC 2616

Accept-Encoding Content-Encoding medium, typically compression, that the content has been encoded with compress

gzip deflate

RFC 7231

RFC 2616

Accept-Charset Content-Type the character set used by the web page iso-8859-5

unicode-1-1

RFC 7231

RFC 2616

Accept-Datetime Memento-Datetime time of the representation Fri, 15 Aug 2014 13:43:03

GMT

RFC 7089

Memento provides the Accept-Datetime request header so that clients can provide a date to the server, and the server can provide the best archived version of a page for that date. This is referred to as datetime negotiation.

To understand Memento fully, one must realize that the Last-Modified header provided by HTTP[4] does not necessarily reflect when a particular version of a web page came into existence. Also, the Last-Modified header may not exist in some cases. To provide more information, the Memento-Datetime header has been introduced to indicate when a specific representation of a web page was observed on the web.[5]

This diagram shows how Memento uses a TimeGate (URI-G) to find the best archived page (URI-M) for a user, given the original resource (URI-R) and a datetime.

The diagram above shows the 3 step process by which Memento finds the best archived web page for the datetime supplied by the user. The process works as follows:

  1. The Memento client contacts the original resource to see if it will return information about a TimeGate (URI-G) in the Link header.
  2. The Memento client then uses the Accept-Datetime request header to submit the datetime desired by the user to the URI-G discovered in the previous step. Most resources on the web do not return a URI-G yet, so most Memento clients use a predefined list of TimeGates to accomplish this step. The TimeGate then returns a 302 redirection status code and a Location header to tell the client where to find the archived resource (URI-M).
  3. The Memento client then requests the archived resource (URI-M) like it would any other web page. The response for the URI-M contains a Memento-Datetime indicating when it was observed on the web.

In this way, Memento utilizes the existing infrastructure of HTTP to accomplish the goals of finding the best archived web page based on a user's desired datetime and URI.

Usage

One can find copies of page by simply navigating, in a web browser, to a link formatted, replacing urltoarchive with the full URL of the page desired:[6]

JSON description of a Memento:

http://timetravel.mementoweb.org/api/json/YYYY/urltoarchive
http://timetravel.mementoweb.org/api/json/YYYYMM/urltoarchive
http://timetravel.mementoweb.org/api/json/YYYYMMDD/urltoarchive
http://timetravel.mementoweb.org/api/json/YYYYMMDDHH/urltoarchive
http://timetravel.mementoweb.org/api/json/YYYYMMDDHHMM/urltoarchive
or

redirect to a Memento with a datetime that is close to a desired datetime:

http://timetravel.mementoweb.org/memento/YYYY/urltoarchive
http://timetravel.mementoweb.org/memento/YYYYMM/urltoarchive
http://timetravel.mementoweb.org/memento/YYYYMMDD/urltoarchive
http://timetravel.mementoweb.org/memento/YYYYMMDDHH/urltoarchive
http://timetravel.mementoweb.org/memento/YYYYMMDDHHMM/urltoarchive
gollark: No, it could have been secured, it would have been quite easy.
gollark: <@110699310512906240> I think I've patched out the bug you used.
gollark: * wronog
gollark: * wrong]
gollark: It's easy to sandbox. They just did it worng.

References

  1. RFC 7089: HTTP Framework for Time-Based Access to Resource States -- Memento
  2. Berners Lee, Tim. "Web Architecture: Generic Resources". World Wide Web Consortium (W3C). 1996. http://www.w3.org/DesignIssues/Generic Archived 2015-06-02 at the Wayback Machine
  3. RFC 7231: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content
  4. RFC 7232: Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests
  5. Nelson, Michael L. "2010-11-05: Memento-Datetime is not Last-Modified". Web Science and Digital Libraries Research Group. November 5, 2010. http://ws-dl.blogspot.com/2010/11/2010-11-05-memento-datetime-is-not-last.html Archived 2015-05-19 at the Wayback Machine
  6. "Time Travel APIs". timetravel.mementoweb.org. Archived from the original on 2018-05-21. Retrieved 2018-05-15.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.