Sunday, April 03, 2005

Size of the Textual Web

In this interesting talk, Jeff Dean pointed out the size of the web that Google deals with:
  • ~4 billion pages
  • ~10 KB/page
  • ~40 TB
This is much smaller than I thought. If you have a 100Mbps connection to the Internet and keep it fully loaded 24x7, you can download ~3TB per month. So you need ~13 such connections to download the whole web in a month, which is not a small requirement, but still within reach.

One note is I think the 10KB/page content is what Google cares for search (regular, not image, not video, etc.) per page. So it's mostly textual.