2015 UK Domain Crawl has started

Information from the team at the UK Web Archive, the Library's premier resource of archived UK websites

The UK Web Archive, the Library's premier resource of archived UK websites

We are proud to announce that the 2015 UK Domain Crawl has started !

Over the next weeks our web crawler will visit every website in the UK, download and keep it safe on the British Library archive servers.

https://commons.wikimedia.org/wiki/File%3ARobot_icon.svg By Bilboq (Own work) [Public domain], via Wikimedia Commons

Previous crawls

The first ever UK Domain crawl was run in 2013 it resulted in:

3.8 million seeds (starting URLs)
31TB data
1.9 billion web pages and other assets

The 2014 built on experiences and yielded:

20 million seeds
Geo IP check of UK hosted websites (2.5 million seeds)
56TB data
2.5 billion webpages and other assets
including: 4.7GB of viruses and 3.2TB of screenshots

Guesswork

What will the 2015 crawl be like? Will we find more urls? Surely the web grows every day, but how much? Will there be more data? Will we have more virus content?

Tweet your suggestions and thoughts about the UK Domain @UKWebArchive or use the #UKWebCrawl2015

Posted by Sabine Hartmann at 12:03 PM

UK Web Archive blog

2015 UK Domain Crawl has started

We are proud to announce that the 2015 UK Domain Crawl has started !

Previous crawls

Guesswork

Comments