Jump to content

Recommended Posts

Posted

Is there anyway of finding out if Robozilla/1.0 has a problem accessing my site?

 

grep Robozilla/1.0 /home/web/tech-httpd-access.log = 0 hits

 

grep dmoz /home/web/tech-httpd-access.log = util-n02.dmoz.aol.com

 

The site has been accessed by editors.dmoz.org and 3 urls have been added. However, for some reason only 1 url is currently listed. I use IP blocking software for known problem data centers and rouge bots. It is possible that I've banned an editors IP who may have been checking the site. If I have done so can a url be removed for that reason?

 

Thanks

Posted

If editors or robots find a dysfunctional listed website, they usually move it into the pool of websites awaiting evaluation by a human. After a period of time, if the site continues to fail and an alternative URL can't be found, it's usually removed.

 

We have lots of editors around the world and we have quite a few QC robots, some of which are hosted on editors' servers, not AOL's. Their IP addresses aren't predictable.

Posted

It's Me

 

Thanks Jim.

 

Very recent log entries show that software I'm using to combat abuse is blocking dmoz. Why, I don't know and that is what I'm trying to discover. The software is all open source and widely used. I've emailed the "maintainers" the log files in an effort to try and make sure this doesn't continue to happen. If I can solve this problem it should save everyone involved alot time in the future.

 

<recent log entries removed>

Posted
I've removed the log entries from your last post. Please don't include that kind of information in any future posts you make here. Thanks.
Posted

motsa, I have no problem with that but, please note that my url was removed from the log entry. It wasn't an attempt at url posting and that information is freely available using "grep" on a log file.

 

I'm using open source software that is used by many sites to combat abuse, comment spam and scraping. If I'm the only user that is have theses problems I'd be very surprised. Solving this could save all involved alot of time in the future.

 

Regards

Posted

Just to be clear, we can't provide a list of IPs used by our editors and robots. They aren't centrally controlled and they aren't predictable.

 

If you're going to exclude some IPs, it wouldn't be surprising if they included some being used by us.

Posted

Jim, I did block dmoz from checking my site. My fault, I can live with that and I'm taking steps to try and ensure it doesn't happen again. There is are much larger problems that may cause many issues for many of us.

 

Mod_Security denies access to "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)". The reason is there is no space after the last semi colon before the numbers 1813. ";1813" is an AVG link checker and it's applied to the UA of almost anyone who uses it. If an editor is using AVG a 403 response will be delivered by a very popular open source Apache Module.

 

No Jim, I'm not asking for IPs and UAs of the QC bots. The major SEs have stated that if an UA doesn't pass a RDNS it's fake. If a QC bot was to use an inappropriate UA many sites will give it a 403 or a Captcha.

 

Regards

Posted
motsa, I have no problem with that but, please note that my url was removed from the log entry. It wasn't an attempt at url posting and that information is freely available using "grep" on a log file.
That wasn't why I removed it. I removed it because it was unnecessary.

No Jim, I'm not asking for IPs and UAs of the QC bots. The major SEs have stated that if an UA doesn't pass a RDNS it's fake. If a QC bot was to use an inappropriate UA many sites will give it a 403 or a C
Even if a QC tool received an incorrect error, sites are then eventually manually checked by real live human beings.
Posted

It's a repeat of the IP story really. We don't and can't control what UAs our editors' browsers and home brewed QC tools report. Neither are we about to ban our editors from using AVG.

 

Most editors aren't techies (and neither am I much of one for that matter). They are ordinary folk with a passion for category building. They use the same enormous variety of computer systems and tools that the ordinary surfer in the street uses.

 

If you're blocking some of our human editors, you're likely also blocking some of your customers too.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...