Page 1 of 1

Alongside these efforts, the project is also launching

Posted: Wed Jul 09, 2025 9:00 am
by Noyonhasan630
In July, we announced our partnership with the Archives Unleashed project as part of our ongoing effort to make new services available for scholars and students to study the archived web. Joining the curatorial power of our Archive-It service, our work supporting text and data mining, and Archives Unleashed’s in-browser analysis tools will open up new opportunities for understanding the petabyte-scale volume of historical records in web archives.


As part of our partnership, we are releasing a series of publicly available datasets created from archived web collections. a Cohort Program providing funding and technical support for research teams interested in studying web archive collections. These twin efforts aim to help build the infrastructure and services to allow more researchers to europe cell phone number list leverage web archives in their scholarly work. More details on the new public datasets and the cohorts program are below.

Early Web Datasets

Our first in a series of public datasets from the web collections are oriented around the theme of the early web. These are, of course, datasets intended for data mining and researchers using computational tools to study large amounts of data, so are absent the informational or nostalgia value of looking at archived webpages in the Wayback Machine. If the latter is more your interest, here is an archived Geocities page with unicorn GIFs.