We have a website that was listed for several years and was dropped in April 2008 (we know, because log files show referrals from dmoz.org until May 2008).
Since our website is an established quality site, we assumed that there was a problem with the ODP link checker accessing our site.
So looking again our log files, we saw two failed attempts in April by pmoz.info link checker to access the site. So voila, that would explain the site removal.
But why was the link checker blocked by our site? Because the IP address of the link checker belongs to a hosting company that also hosts many spammers. We have found fake Googlebots, scrapers, email harvesters and other suspicious activity from those IP ranges.
I can only assume that we're not the only website which has accidentally blocked the ODP link checker due to bad activity in that neighborhood.
Now if you take Googlebot, for example. Any hit from Googlebot can be easily verified. An IP lookup and a reverse DNS lookup will both confirm that the Googlebot is legitimate and not a fake agent.
Unfortunately, there is no such way that the ODP link checker can be verified in this way.
Webmasters could set up an exception in their security system to allow in anything calling itself "ODP link checker". But if everyone did that, then any bad bot can call itself "ODP link checker" and have a free ticket to every website, scraping email addresses et al. So this isn't the ideal situation.
So please re-consider the draconian consequences of removing websites automatically because the link checker bot is accidentally blocked by certain websites.
It could have worked a few years ago, but firewalls and website security measures are becoming more strict these days, and the ODP link checker is not being updated with the times.
I think that:
1) if the link checker tool cannot access a website, it should only flag the website
2) the listing should remain in ODP on 'probation' until manually reviewed to confirm that the site is gone
3) failing the above, please rewrite your link checker code so that it sends the correct headers and uses an IP address and/or reverse DNS record so that any website can confirm it is legitimate
4) if possible, please fast-track our website to restore the listing (we've emailed the editor and the staff, but no response yet)
Since our website is an established quality site, we assumed that there was a problem with the ODP link checker accessing our site.
So looking again our log files, we saw two failed attempts in April by pmoz.info link checker to access the site. So voila, that would explain the site removal.
But why was the link checker blocked by our site? Because the IP address of the link checker belongs to a hosting company that also hosts many spammers. We have found fake Googlebots, scrapers, email harvesters and other suspicious activity from those IP ranges.
I can only assume that we're not the only website which has accidentally blocked the ODP link checker due to bad activity in that neighborhood.
Now if you take Googlebot, for example. Any hit from Googlebot can be easily verified. An IP lookup and a reverse DNS lookup will both confirm that the Googlebot is legitimate and not a fake agent.
Unfortunately, there is no such way that the ODP link checker can be verified in this way.
Webmasters could set up an exception in their security system to allow in anything calling itself "ODP link checker". But if everyone did that, then any bad bot can call itself "ODP link checker" and have a free ticket to every website, scraping email addresses et al. So this isn't the ideal situation.
So please re-consider the draconian consequences of removing websites automatically because the link checker bot is accidentally blocked by certain websites.
It could have worked a few years ago, but firewalls and website security measures are becoming more strict these days, and the ODP link checker is not being updated with the times.
I think that:
1) if the link checker tool cannot access a website, it should only flag the website
2) the listing should remain in ODP on 'probation' until manually reviewed to confirm that the site is gone
3) failing the above, please rewrite your link checker code so that it sends the correct headers and uses an IP address and/or reverse DNS record so that any website can confirm it is legitimate
4) if possible, please fast-track our website to restore the listing (we've emailed the editor and the staff, but no response yet)