Robozilla/1.0

Tech Owner

Member
Joined
May 21, 2008
Messages
20
Is there anyway of finding out if Robozilla/1.0 has a problem accessing my site?

grep Robozilla/1.0 /home/web/tech-httpd-access.log = 0 hits

grep dmoz /home/web/tech-httpd-access.log = util-n02.dmoz.aol.com

The site has been accessed by editors.dmoz.org and 3 urls have been added. However, for some reason only 1 url is currently listed. I use IP blocking software for known problem data centers and rouge bots. It is possible that I've banned an editors IP who may have been checking the site. If I have done so can a url be removed for that reason?

Thanks
 

jimnoble

DMOZ Meta
Joined
Mar 26, 2002
Messages
18,915
Location
Southern England
If editors or robots find a dysfunctional listed website, they usually move it into the pool of websites awaiting evaluation by a human. After a period of time, if the site continues to fail and an alternative URL can't be found, it's usually removed.

We have lots of editors around the world and we have quite a few QC robots, some of which are hosted on editors' servers, not AOL's. Their IP addresses aren't predictable.
 

Tech Owner

Member
Joined
May 21, 2008
Messages
20
It's Me

Thanks Jim.

Very recent log entries show that software I'm using to combat abuse is blocking dmoz. Why, I don't know and that is what I'm trying to discover. The software is all open source and widely used. I've emailed the "maintainers" the log files in an effort to try and make sure this doesn't continue to happen. If I can solve this problem it should save everyone involved alot time in the future.

<recent log entries removed>
 

motsa

Curlie Admin
Joined
Sep 18, 2002
Messages
13,294
I've removed the log entries from your last post. Please don't include that kind of information in any future posts you make here. Thanks.
 

Tech Owner

Member
Joined
May 21, 2008
Messages
20
motsa, I have no problem with that but, please note that my url was removed from the log entry. It wasn't an attempt at url posting and that information is freely available using "grep" on a log file.

I'm using open source software that is used by many sites to combat abuse, comment spam and scraping. If I'm the only user that is have theses problems I'd be very surprised. Solving this could save all involved alot of time in the future.

Regards
 

jimnoble

DMOZ Meta
Joined
Mar 26, 2002
Messages
18,915
Location
Southern England
Just to be clear, we can't provide a list of IPs used by our editors and robots. They aren't centrally controlled and they aren't predictable.

If you're going to exclude some IPs, it wouldn't be surprising if they included some being used by us.
 

Tech Owner

Member
Joined
May 21, 2008
Messages
20
Jim, I did block dmoz from checking my site. My fault, I can live with that and I'm taking steps to try and ensure it doesn't happen again. There is are much larger problems that may cause many issues for many of us.

Mod_Security denies access to "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)". The reason is there is no space after the last semi colon before the numbers 1813. ";1813" is an AVG link checker and it's applied to the UA of almost anyone who uses it. If an editor is using AVG a 403 response will be delivered by a very popular open source Apache Module.

No Jim, I'm not asking for IPs and UAs of the QC bots. The major SEs have stated that if an UA doesn't pass a RDNS it's fake. If a QC bot was to use an inappropriate UA many sites will give it a 403 or a Captcha.

Regards
 

motsa

Curlie Admin
Joined
Sep 18, 2002
Messages
13,294
motsa, I have no problem with that but, please note that my url was removed from the log entry. It wasn't an attempt at url posting and that information is freely available using "grep" on a log file.
That wasn't why I removed it. I removed it because it was unnecessary.
No Jim, I'm not asking for IPs and UAs of the QC bots. The major SEs have stated that if an UA doesn't pass a RDNS it's fake. If a QC bot was to use an inappropriate UA many sites will give it a 403 or a C
Even if a QC tool received an incorrect error, sites are then eventually manually checked by real live human beings.
 

jimnoble

DMOZ Meta
Joined
Mar 26, 2002
Messages
18,915
Location
Southern England
It's a repeat of the IP story really. We don't and can't control what UAs our editors' browsers and home brewed QC tools report. Neither are we about to ban our editors from using AVG.

Most editors aren't techies (and neither am I much of one for that matter). They are ordinary folk with a passion for category building. They use the same enormous variety of computer systems and tools that the ordinary surfer in the street uses.

If you're blocking some of our human editors, you're likely also blocking some of your customers too.
 
This site has been archived and is no longer accepting new content.
Top