Google… how?
I just wanted to share this bit of data with you, because it makes my head hurt.
I wrote a post this morning – on a weird UCMA error. I published that post at 10:15:30 AM exactly. On my blog, publishing a post also updates the home page, with the new post.
Now, my site is definitely not a large concern. I get excited if one person a day looks at it. That makes what I’m about to tell you even more amazing.
Google updated its cache of my home page 9 seconds later and cached the new page 15 seconds later. At that point, both pages started showing up in search results.
That just … makes my head hurt. It really does.
In my world there are trade-offs between speed of data recovery and freshness of data. I’m aware Google use an inverted-index, and recently rolled out several changes to be able to keep this continually up-to-date, but still – what did I do to trigger a re-crawl of my site?
I’m automatically keeping my sitemaps up to date, but that’s not a push technology.
I automatically post to Twitter on every new post. Maybe they continually scan Twitter, looking for URLs they haven’t seen before? Whatever it is, I’m very impressed!
My reading for tonight:
- The Google Platform
- Presentation on Search by Jeff Dean
- Stack Overflow – How can Google be so fast?
- The Anatomy of a Large-Scale Hypertextual Web Search Engine (Brin/Page)