Jun
13th

Google Doesn’t Index All File Extensions

Posted by Mark

When it comes to file extensions, not all are created equal. If your URL ends a certain way, there’s a good chance that page won’t be crawled, says Google’s head webspam fighter Matt Cutts.

Cutts posts on his blog that Google will crawl/index any common file extension so long as it doesn’t have a history of being generally useless. These include .html, .htm, .php, .asp, etc.

But .exe? Not a chance. .dll? Nope. .bin? Fahgeddaboudit.

Matt gives his standard explanation: “[T]here are some file extensions that are mostly binary data, such as .exe, where the vast majority of the time the data would be meaningless blobs, so there are a few extensions to avoid. If your files are named example.dll or example.bin and you don’t see Google crawling pages with that file extension, I’d recommend changing your file extension to something else.”

He recommends also that if a webmaster is unsure about a file extension, they should run a search on Google [filetype .exe] to see if Google has actually indexed any of those files. If not, your page isn’t going to be indexed either.

That doesn’t mean Google won’t have a change of heart about it. After reviewing feedback about HTML pages ending .0, for example if the page ended with Web2.0, they decided to index pages with .0 as a file extension.

Share/Save/Bookmark

Files under Google

Post a Comment