The Australian Seo Dan Petrovic has done an interesting experiment on Google’s Image Search. He has placed 100 pictures with numbers from 1 to 100 as the filename on a website. How does Google index it? How long does it take to get the images are indexed? In what order the images are crawled and indexed? And in what order they appear in the search results? Dan’s results are, well, a bit confusing to me (see „SEO experiment: Google Image Search‚). So I rebuilt the whole thing to see clearer …
First, the experimental setup. I created the following image 100 individual images.
The single images were named according to the visible numbers on the image: 01.png up to 100.png. I created a website on which the images are sorted numerically (on my german tagSeoBlog). This page I have posted on Twitter, Facebook and Google+ to let the Googlebot take attention of it. (Btw: the following is also documented here on Google+ :-)
1. The page in the organic index
As expected, the new page was indexed by Google after 15 minutes. It took a little bit longer than with blog articles, but at least Google is very fast with the indexing. Logically, if they put so much emphasis on Freshness.
2. The first picture
The first image could be found 20 minutes after uploading . Interesting: it was image no. 01.png:
3. More images following
Less that 35 minutes later more images apeeared. But strange that image 01.png disappeared.
4. Suddenly everything was gone…
Now it gets quite dubious. A few minutes later all pictures were gone. I suspect Google would display the images very quickly, but then they were send to an internal evaluation process (and disappeared for this time). Alternatively it was a data-centre problem.
5. Back again, but different order
Again it took about 20 minutes, then the pictures (some) were shown again, but two less than before, now 33 – but in a different order:
Why the order was changed after the (supposed) internal evaluation, I don’t know. One thing is clear: the order of pictures has nothing to do with the filename-number or the order in which they appear on the website. It seems to be completely random. And the picture number 14, which was previously at number 1, was suddenly gone. Strange …
6. Search by color
Googles image search allows to filter the results by color (option in the left sidebar). Again, I noticed something interesting: the „blue“ results show much more blue images than in the normal setting „all colors“. Here’s the screenshot:
Why the hell? For example Images number 27 – first of the blue results – is not part of the all color-results (above). Also, the order of the images Blue is quite different from the order of the images Blue in the overall representation. What is the consequence of this? What is the learning? I made a pause…
6 hours later …
Six hours later, I documented again the state of art. Currently the number of displayed images has increased from 33 to 42. The order is almost unchanged. But, however, a set of nine new images are shown on the front positions.
A few exceptions: Number 62 (above) is gone. The number 100 is now re-entered Position no. 5.
Here you can see how many and in what order the test images currently are shown in Googles‘ image search.
All images already indexed …
Now, about 7 hours after the upload, I’ve tried the following: with an attached keyword the search site can be improved:
site: tagseoblog.de/google-pictures-test 01 (with a „01“ at the end)
To my surprise, I had to realize that all (!) Images are already indexed (I tried all with attached numbers). But the site-query doesn’t show them all. And: the order of the results of a search site is just random or unsorted. Seems to be clear: how should Google rank the results? There is no logical criteria for it.
The situation is different if you’re looking for a specific phrase – like at the very beginning, for example, when I search for the first and only sentence on the test page. Then – of course – the first image is displayed because it is most near this search-phrase.
How fast and in what order the images were indexed?
At this point I realized that I was oon a wrong way. If you want to learn about indexing, one has to look at the crawling behavior of Googlebot-Images 1.0 in the log-files. You could click the image on the right side for reading. This is what you could read out of it:
- All 100 images were crawled within 2 minutes (17:22 to 17:24)
- It was ten minutes after uploading, and about 5 minutes after the social links
- The crawl-sequence is completely random: 46, 62, 38, 17, 14, 44, …
- Why is the bot not working from top to bottom? I don’t know.
The experiment shows one thing: the search site Google is crap.
- Images are rapidly indexed by Google fast, if the bot has found it only once.
- The random sequence at the site query results from a lack of ranking criterion.
- The search site does not display all images (obviously a mistake).
- -> From the search site, one can not infer the actual state of the indexing!
- In a specific search request, the image is taken, which is positioned in the source closest to the searched phrase (in this case, the first, because the text is before the first picture).
[This is a translation of my german blogpost.]