Tag Archives: Wayback Machine

Obama’s Change.gov promise to protect whistleblowers? Scrubbed from the Web

Well, this pissed me off. Long-time readers of this site may recall my interest in the Internet Archive’s Wayback Machine, which aims to preserve the historical web. I’ve previously written to criticize the Bush administration for its lengthy robots.txt exclusion file (thousands of lines long), which could be viewed as an attempt to prevent the […]

Read More

Major expansion of Wayback Machine’s archive of the historical internet

The Next Web reports that the Internet Archive has vastly increased its historical database of the web: The Internet Archive has updated its Wayback Machine with a significant bump in coverage: the service has gone from 150,000,000,000 URLs to having 240,000,000,000 URLs, a total of about 5 petabytes of data. More specifically, the Wayback Machine […]

Read More

NARA hosting “lite” Bush website archive

There are plenty of good changes in the new whitehouse.gov site, such as a better copyright policy that enables clearer copying and remix, and a much shorter robots.txt file, which makes it easier for search engines and archivists to index and archive the site.  (Compare the current 4-line Obama robots file to a 2300+ version […]

Read More

Is Zoetrope the next-gen Internet Archive?

Although the Internet Archive’s Wayback Machine is a great research tool, its utility is hampered but a lack of basic search mechanisms.  One can search by URL and archived links, but basic Google-style boolean searching isn’t available.  The Archive once offered a beta boolean search tool, but it never worked and it was later withdrawn. […]

Read More