“We just launched a 16TB archive of every dataset that has been available on data.gov since November. This will be updated day by day as new datasets appear. It can be freely copied, and we're sharing the code behind it to help others make their own archives of data they depend on.” Harvard Library Innovation Lab (via BlueSky)

Today we released our archive of data.gov on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complete archive of federal public datasets linked by data.gov. It will be updated daily as new datasets are added to data.gov. This is the first release in our new data vault project to preserve and authenticate vital public datasets for academic research, policymaking, and public use. We’ve built this project on our long-standing commitment to preserving government records and making public information available to everyone. Libraries play an essential role in safeguarding the integrity of digital information. By preserving detailed metadata and establishing digital signatures for authenticity and provenance, we make it easier for researchers and the public to cite and access the information they need over time.
Have you responded to this post on your own site? Send a webmention! Note: Webmentions are moderated for anti-spam purposes, so they will not appear immediately.