We're thoroughly baffled. And our apologies in advance for being long-winded, but we think it is warranted (the windiness, as well as the apology )
Sometime in the last week to ten days, our site (http://football.refs.org) was deleted from DMOZ. Baffling to us, considering that less three years ago, our site was THE starred site in the DMOZ category of Sports: Football: American: Officiating. (As proof, look at IA's Wayback Machine for DMOZ's listing for August 2001 - http://web.archive.org/web/20010802200232/dmoz.org/Sports/Football/American/Officiating/).
We couldn't imagine why we were suddenly removed from DMOZ until we considered the following:
In the last three months, we have taken a significantly more aggressive approach in securing our server. Malicious harvesters, spiders, and many unacceptable user-agents are being denied any access to our site on our server. We utilize a comprehensive list of banned "bad bots", banned IP addresses, and the like that are denied access to our server for security reasons. As an example, web crawlers that have violated or are known to violate the robots exclusion standard are forbidden from any and ALL access to our server; we don't even let them access the robots.txt file, since they violate it anyway.
We can only speculate that someone from DMOZ attempted to reach our site using a banned user-agent and , as such, was then denied access & sent to our "forbidden" page.
Our sites are fully reachable utilizing 99.8% of available browsers - Netscape (all versions). IE (all versions), Opera (all versions), Lynx (for you die-hards); however, scores of bad user-agents, such as Webdav, Zeus, Nutch, Indy Library, etc., are not acceptable and are forbidden access.
One can choose to allow or deny Googlebot (or any bot) access to one's server per the robots.txt files utilizing the robots exclusion standards. Unfortunately, there appears to be a total lack of any semblance of a DMOZ standard. In our opinion, DMOZ utilization of human editors versus bots/spiders/etc does not eliminate a need for standardized browser/user-agent identifiers. We believe that that fact needs to be addressed and corrected by DMOZ.
The lack of a specific official "search bot" or standardized indentifiable user-agent tag utilized by DMOZ editors, does a disservice to the DMOZ because server and/or site owners are unable to clearly identify DMOZ inquiries and ,therefore, cannot choose to allow or to deny a DMOZ search of a site. Our situation may exemplify how this lack of standard(s) unfairly penalizes the server owners who take server security seriously, which in return penalizes the DMOZ with misinformation, which then ultimately penalizes the DMOZ users who are given incomplete information.
Can anyone explain any other logical reason why we would go from the premier site in our category to non-existant on DMOZ?
Heck, our site's links section lists over 130 sites overwhelms the DMOZ's listing of only 35 in this category to which we belong (or to which we had belonged until a week ago).
Interesting aside, FWIW - if one goes to http://www.dmoz.org/Sports/Officiating/ , our category of Football, American shows "@40", but clicking on the link shows "Sports: Football: American: Officiating (35)" - it appears to us there are several errors here in DMOZ.
Does dmoz.org/Sports/Football/American/Officiating/ have an editor? more than one? Is there anyway for us to make our server DMOZ friendlier? If we knew how, we would attempt to accomodate.
Thanks to those waded through this post. Any thoughts - anyone?
Sometime in the last week to ten days, our site (http://football.refs.org) was deleted from DMOZ. Baffling to us, considering that less three years ago, our site was THE starred site in the DMOZ category of Sports: Football: American: Officiating. (As proof, look at IA's Wayback Machine for DMOZ's listing for August 2001 - http://web.archive.org/web/20010802200232/dmoz.org/Sports/Football/American/Officiating/).
We couldn't imagine why we were suddenly removed from DMOZ until we considered the following:
In the last three months, we have taken a significantly more aggressive approach in securing our server. Malicious harvesters, spiders, and many unacceptable user-agents are being denied any access to our site on our server. We utilize a comprehensive list of banned "bad bots", banned IP addresses, and the like that are denied access to our server for security reasons. As an example, web crawlers that have violated or are known to violate the robots exclusion standard are forbidden from any and ALL access to our server; we don't even let them access the robots.txt file, since they violate it anyway.
We can only speculate that someone from DMOZ attempted to reach our site using a banned user-agent and , as such, was then denied access & sent to our "forbidden" page.
Our sites are fully reachable utilizing 99.8% of available browsers - Netscape (all versions). IE (all versions), Opera (all versions), Lynx (for you die-hards); however, scores of bad user-agents, such as Webdav, Zeus, Nutch, Indy Library, etc., are not acceptable and are forbidden access.
One can choose to allow or deny Googlebot (or any bot) access to one's server per the robots.txt files utilizing the robots exclusion standards. Unfortunately, there appears to be a total lack of any semblance of a DMOZ standard. In our opinion, DMOZ utilization of human editors versus bots/spiders/etc does not eliminate a need for standardized browser/user-agent identifiers. We believe that that fact needs to be addressed and corrected by DMOZ.
The lack of a specific official "search bot" or standardized indentifiable user-agent tag utilized by DMOZ editors, does a disservice to the DMOZ because server and/or site owners are unable to clearly identify DMOZ inquiries and ,therefore, cannot choose to allow or to deny a DMOZ search of a site. Our situation may exemplify how this lack of standard(s) unfairly penalizes the server owners who take server security seriously, which in return penalizes the DMOZ with misinformation, which then ultimately penalizes the DMOZ users who are given incomplete information.
Can anyone explain any other logical reason why we would go from the premier site in our category to non-existant on DMOZ?
Heck, our site's links section lists over 130 sites overwhelms the DMOZ's listing of only 35 in this category to which we belong (or to which we had belonged until a week ago).
Interesting aside, FWIW - if one goes to http://www.dmoz.org/Sports/Officiating/ , our category of Football, American shows "@40", but clicking on the link shows "Sports: Football: American: Officiating (35)" - it appears to us there are several errors here in DMOZ.
Does dmoz.org/Sports/Football/American/Officiating/ have an editor? more than one? Is there anyway for us to make our server DMOZ friendlier? If we knew how, we would attempt to accomodate.
Thanks to those waded through this post. Any thoughts - anyone?