Jump to content

HELP - Banned from dmoz.org


Recommended Posts

Posted
Hello. First let me say thank you for all the hard work that you do. I've been using DMOZ data on my site (see my profile) with full attribution for over a year now. Today, all my connections to dmoz.org get 403's. I didn't receive an email and I have no idea why I was banned. I've been operating well within the license agreement. Could somebody please help? Thank you in advance.
  • Meta
Posted

As you propably may have noticed it is working again. At least when I'm browsing your site.

 

AFAIK the server might have been down for some hardware replacement action, something like this was planned for some time now.

Curlie Meta/kMeta Editor windharp

 

d9aaee9797988d021d7c863cef1d0327.gif

Posted

Thank you for responding so quickly but unfortunately it's still going on and I still get 403's. Here is a link to a tool to view the HTML source of webpages from ilectric:

 

http://ilectric.com/find.cgi?e=046&s=dmoz.org

 

Compare that to a similar tool offered by SamSpade:

 

http://samspade.org/t/safe?u=http%3A%2F%2Fdmoz.org

 

It appears that my whole netblock is banned...

In the meantime, I'm using some older cached copies of the major categories but would very much like to get this resolved.

  • Meta
Posted

I'm not a specialist in this issue, but I could imagine three situations, where you really could not access the site. Be assured I did not check if either of the three is applicable:

 

1) Your request is not "identifying correctly" in any form [i don't know how those have to be done, only a guess]. The only thing I know is that I tried a Delphi-programmed minibrowser last year which was unable to connect to DMOZ, too. Had no error handling implemented, so I dont actually know why.

 

2) From your site, extreme heavy traffic was generated - especially regarding search requests. If that was the case, you should switch to a RDF-based directory instead of feeding from live data - download the RDF data once and supply the categories from your local server. Thats the way it normally should be for any downstream-user. Of course there is one temporary problem: RDFs are normally generated weekly but due to technical difficulties the last successfull RDF dump has been made in September.

 

3) If you would have violated ODP terms of service in the past, you might have been blocked for that. Cant spot anything likely to violate ODP guidelines on your site now, but I cant look into the past :-)

 

[EDIT AGAIN]

 

Oh, well I see a possible reason for 3 - Please have a look at http://dmoz.org/become_an_editor/ again. Can't spot http://dmoz.org/about.html

Curlie Meta/kMeta Editor windharp

 

d9aaee9797988d021d7c863cef1d0327.gif

Posted
I've been using the live data but I cache it for about 25 days so I doubt my traffic would be too heavy. I'd happily increase the cache time or do whatever is necessary to lift this ban. I have nothing but respect and gratitude for the ODP and would like to see this through. Please pass this on to any relevant person within the ODP. Thank you.
Posted
Oops. I did leave out that about link... I'm very sorry. I changed the attribution at the bottom of my directory pages to have all the required parts again and also a link to the update page. If there is anything else just say the word. Thank you.
  • Meta
Posted

I don't see the problem. Both the ilectric and the sam spade return code 200, and show the dmoz.org home page -- both look complete. What is missing?

 

It does sound like you were overclocking a spider. IIRC, dmoz.org asks that you keep the page request rate down to one every second or two. This is basic courtesy for bots, and if that were the problem, I very much doubt that anyone would think of trying to send e-mail to the spider (assuming it identifies itself to that extent) before squashing it like a bug.

 

I will say that dmoz.org has sustained a DDOS attack fairly recently, and you can probably expect anything that looks like packet piling-on to be treated the same way the last DDOS attack was. (Whether it's staff perusing logs whenever the meta-editors complain about the quake game bogging down, or something a trifle more automated, dmoz.org obviously has -- and needs -- some sort of protection.)

Posted
Woohoo! My ban was lfted. Many thanks to everyone who helped.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...