Skip to Content


Title Updatedsort icon

Diigo provides a browser add-on that can really improve your research productivity. As you read on the web, instead of just bookmarking, you can highlight portions of web pages that are of particular interest to you. You can also attach sticky notes to specific parts of web pages.

7 years 34 weeks ago

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

7 years 34 weeks ago


HTTrack is a free and easy-to-use offline browser utility. It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online.

7 years 34 weeks ago

Metaproducts offers several commercial capture and off-line browsing tools.

7 years 34 weeks ago

The goal of the mod_oai project is to bring the efficiency of OAI-PMH to everyday web sites.

7 years 34 weeks ago
The Nalanda iVia Focused Crawler

The Nalanda iVia Focused Crawler (NIFC) is a focused Web crawler. It was created by Dr. Soumen Chakrabarti (Indian Institute of Technology Bombay) and developed with the support of IIT Bombay, the iVia Team and the U.S. Institute of Museum and Library Services.

7 years 34 weeks ago

The NetarchiveSuite is the complete web archiving software package developed within the project from 2004 and onwards. The primary function of the NetarchiveSuite is to plan, schedule and run web harvests of parts of the Internet. It scales to a wide range of tasks, from small, thematic harvests (e.g. related to special events, or special domains) to harvesting and archiving the content of an entire national domain.

7 years 34 weeks ago


pageVault supports the archiving of all unique responses generated by a web server. It allows you to know exactly what information you have published on your web site, whether static pages or dynamically generated content, and regardless of format (HTML, XML, PDF, zip, Microsoft Office formats, images, sound), regardless of rate of change.

7 years 34 weeks ago
Spadix software

Spadix Software can download websites from a starting URL, search engine results or web dirs, and is able to follow external links. It also supports filtering and crawling of password-protected sites.

7 years 34 weeks ago

Sparkleware is a commercial off-line browser.

7 years 34 weeks ago
Syndicate content

about seo