The Internet Archive launched the Wayback Machine in October 2001. It was set up by Brewster Kahle and Bruce Gilliat, and is maintained with content from Alexa Internet. The service enables users to see archived versions of web pages across time, which the archive calls a "three dimensional index".
Since 1996, the Wayback Machine has been archiving cached pages of websites onto its large cluster of Linux nodes. It revisits sites on occasion (see technical details below) and archives a new version. Sites can also be captured on the fly by visitors who enter the site's URL into a search box. The intent is to capture and archive content that otherwise would be lost whenever a site is changed or closed down. The overall vision of the machine's creators is to archive the entire Internet.
Information had been kept on digital tape for five years, with Kahle occasionally allowing researchers and scientists to tap into the clunky database. When the archive reached its fifth anniversary, in 2001, it was unveiled and opened to the public in a ceremony at the University of California, Berkeley.
The name Wayback Machine was chosen as a reference to the "WABAC machine" (pronounced way-back), a time-traveling device used by the characters Mr. Peabody and Sherman in The Rocky and Bullwinkle Show, an animated cartoon. In one of the animated cartoon's component segments, Peabody's Improbable History, the characters routinely used the machine to witness, participate in, and, more often than not, alter famous events in history.
Software has been developed to "crawl" the web and download all publicly accessible World Wide Web pages, the Gopher hierarchy, the Netnews (Usenet) bulletin board system, and downloadable software. The information collected by these "crawlers" does not include all the information available on the Internet, since much of the data is restricted by the publisher or stored in databases that are not accessible. To overcome inconsistencies in partially cached websites, Archive-It.org was developed in 2005 by the Internet Archive as a means of allowing institutions and content creators to voluntarily harvest and preserve collections of digital content, and create digital archives.
Crawls are contributed from various sources, some imported from third parties and others generated internally by the Archive. For example, crawls are contributed by the Sloan Foundation and Alexa, crawls run by IA on behalf of NARA and the Internet Memory Foundation, mirrors of Common Crawl. The "Worldwide Web Crawls" have been running since 2010 and capture the global Web.
The frequency of snapshot captures varies per website. Websites in the "Worldwide Web Crawls" are included in a "crawl list", with the site archived once per crawl. A crawl can take months or even years to complete depending on size. For example, "Wide Crawl Number 13" started on January 9, 2015, and completed on July 11, 2016. However, there may be multiple crawls ongoing at any one time, and a site might be included in more than one crawl list, so how often a site is crawled varies widely.