In some cases, a broken or malformed link can cause issues with site spiderability.
In this article we look at what those problems could be and give some suggestions on tools you can use to help diagnose and solve your problem.
How do you check for broken and mal-formed links?
Well there are a few ways. One is to use a search engine spider simulator. We have reviewed many different simulators, all with differing abilities.
For example, Webconfs spider simulator shows you all the spidered text, as well as links and meta description and keywords.
The Delorie spider simulator goes a little bit further. It is a web based version of the text based Lynx web browser. Simply enter your URL and it returns a summary of the page as well as the text visible on the page, but no links.
What Search Engine Spiders See is another freely available spider simulator. As with the other tools, you enter the URL to see results however you do have some options to choose, such as displaying title, keywords, links, keyword counts and more.
SEOBench also has a Search Engine Crawler Simulator which also returns relevant information such as text, links and even page size.
The reason I bring these up is that they are a great starting point. If you know where a crawler appears to be stopping (based on an analysis of your analytics) you can use that as a starting point for diagnosing your site to see if there are indeed crawling problems.
Obviously you will want to use one that returns URLs so that you can review the links on your page to see if there is indeed a coding problem. It would probably also be a good idea to have your website open in another browser window to compare the links you see to the links the simulator sees.
This is because you obviously have links on your site that it can not see, therefore you will want to be able to spot them and resolve the issues.
But what if the links are not immediately identified as problems?
Sometimes the above strategy just will not work. Sometimes you can load up the same URL in your browser and the spider simulator yet not immediately see what the problem is.
Perhaps the links ARE in fact spiderable but for some reason the crawlers just are not able to see them. Or, perhaps there is another issue which is more fundamental to the site. This is where a program which can automatically crawl your entire site comes into play.
Xenu’s Link Sleuth is just such a program. It will allow you to crawl an entire site and display all the URLs it finds.
A warning, however, if you have an extremely large site this program may either take a long time to complete, or fail altogether. Therefore I would not recommend it for sites over about 500 pages.
You must also pay attention to the options you use. Be sure to uncheck the box below your URL so that you do not crawl external URLs. Also, you may want to let your hosting provider and/or webmaster know if you are going to use this program.
That is because if they happen to be paying attention to your bandwidth usage logs they may determine that some sort of automated program is attempting to download your site and ban your IP from accessing your own site!
One thing you should know about this program is that it will not only return web page URLs but it will return EVERY URL on your site – including CSS, JS, image and email URLs. Everything that is a viewable link gets picked up by Xenu.
Under the “view” menu you can choose to see broken links only. This is quite handy if you find that it appears to crawl your entire site. By viewing only broken links you may be able to begin diagnosing your problems.
In the end
Using the techniques I have described above you should be able to easily find and diagnose crawling problems as they pertain to your website. But if the problems do not become apparent, it may also be worthwhile to hire a firm which specializes in search indexing issues.



