Webarchive

The webarchive file format is available on macOS and Windows for saving and reviewing complete web pages using the Safari web browser.[1] The webarchive format differs from a standalone HTML file because it also saves linked files such as images, CSS, and JavaScript.[2] The webarchive format is a concatenation of source files with filenames saved in the binary plist format using NSKeyedArchiver. Support for webarchive documents was added in Safari 4 Beta on Windows and is included in subsequent versions. Safari in iOS 13 (iPhone and iPad) has support for web archive files[3]. Previously there was a third party iOS app called Web Archive Viewer that provided this functionality.

Web archive
Filename extension
.webarchive
Internet media type
application/x-webarchive
Uniform Type Identifier (UTI)com.apple.webarchive
Type of formatweb page file archive
Extended fromApple Binary Property List

Usage

  • A version of the webarchive format is used to bundle whole music albums and movies with extra content and menus inside iTunes LP and Extras.
  • Webarchives are automatically generated for ads submitted to Apple's iAd advertising platform.[4]
  • The WebKit framework's WebArchive class is used to simplify cutting-and-pasting with whole or partial web pages.[5]

Vulnerability

In February 2013, a vulnerability with the webarchive format was discovered and reported by Joe Vennix, a Metasploit Project developer. The exploit allows an attacker to send a crafted webarchive to a user containing code to access cookies, local files, and other data. Apple's response to the report was that it will not fix the bug, most likely because it requires action on the users' part in opening the file.[6]

Converting for other browsers

Workarounds to allow the file to be viewed in other browsers are possible, though specific webpage contents may hinder this process. This requires one of the free tools WebArchive Folderizer (for OS X 10.2 and higher)[1] or WebArchive Extractor (for OS X 10.4.3 and higher).[7]

Alternatives

MAFF is an open format (with a published specification) that enables saving of whole webpages in a single file. It is currently supported by Firefox, using an extension.[8][9] Other web browsers use the MHTML format or do the equivalent by saving a directory of inline resources (usually images) alongside the HTML file, sometimes compressed, like the .war format used by Konqueror (tar+gzip or tar+bzip2). Safari does not support these alternative archive formats.

For archiving entire websites, the Internet Archive has developed the Web ARChive (WARC) format which was standardized by ISO.

HTMLD (HTML Directory) is a NeXT-developed format for saving web pages and their dependencies in a bundle that may also be served by a web server.[10]

gollark: Current ones have to coördinate at much larger scale though.
gollark: Consider: Egypt is warm. Britain is not warm. You are likely used to higher temperatures.
gollark: See, with modern deep learning, my large set of known-good memes in the form of memeCLOUD™, and the existence of vast reams of probably bad ones on Reddit, I may be able to automatically classify memes as "good" or "bad".
gollark: FEAR the possible GTech™ meme classifier/autoharvester engines.
gollark: Oh. Huh.

References

  1. Frakes, Dan. "De-archive Web Archives". Macworld. IDG Communications. Retrieved 15 June 2018.
  2. Arnott, Nick. "Apple declines to fix vulnerability in Safari's Web Archive files, likely because it requires user action to exploit". iMore. Mobile Nations. Retrieved 7 February 2015.
  3. "iOS and IPadOS 13 Review". MacStories. MacStories. Retrieved 25 September 2019.
  4. "iAd JS Programming Guide: Web Archives and Manifest Files". Mac Developer Library. Apple. Retrieved 7 February 2015.
  5. "WebArchive Class Reference". Mac Developer Library. Apple. Retrieved 7 February 2015.
  6. Vennix, Joe. "Abusing Safari's webarchive file format". Rapid7 Metasploit. Rapid7. Retrieved 7 February 2015.
  7. WebArchive Extractor
  8. "Mozilla Archive Format, with MHT and Faithful Save". Archived from the original on 2 November 2017. Retrieved 8 December 2011.
  9. "WebScrapBook". Retrieved 17 November 2019.
  10. ".htmld Discussion".


This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.