Metaproducts offers several commercial capture and off-line browsing tools.

7 years 4 weeks ago


HTTrack is a free and easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online.

7 years 4 weeks ago

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

7 years 4 weeks ago

Diigo provides a browser add-on that can really improve your research productivity. As you read on the web, instead of just bookmarking, you can highlight portions of web pages that are of particular interest to you. You can also attach sticky notes to specific parts of web pages.

7 years 4 weeks ago
Find It! Keep It!

Find It! Keep It! is a tool to save and organise web content.

7 years 4 weeks ago
The DeDuplicator (Heritrix add-on module)

The DeDuplicator is an add-on module for Heritrix to reduce the amount of duplicate data collected in a series of snapshot crawls.

7 years 4 weeks ago
Data Fountains

Data Fountains is a tool for discovering and describing Internet resources about a particular topic. After signing on the user is guided through a series of Web pages that generate information describing a particular topic.

7 years 4 weeks ago
BAT: BnfArcTools

BAT is a Perl package for processing Internet Archive ARC, DAT and CDX file format. This package was developped and is still maintained by the National Library of France (BnF) and is distributed under the GPL licence.

7 years 4 weeks ago

DeepArc was developed by the National Library of France (BnF) with XQuark to transform relational database content into XML for archiving purposes. It is part of the International Internet Preservation Consortium (IIPC) tool suite for web archiving.

7 years 4 weeks ago

GAip (Gloucestershire Archives ingest packager) is a proof of concept demonstration system written in perl. It provides archivists and others with the means to, 1. ingest a digital object and create the associated Archival Information Package (AIP), 2. compile metadata for the digital object which is included in the AIP, and 3. create Dissemination Information Packages from AIPs in order to provide access to the ingested digital object. GAip operates by way of a graphical user interface.

7 years 4 weeks ago
