My IP Blocked By DMOZ

pfasia

Member
Joined
Sep 2, 2005
Messages
6
I have been using a cgi script to add DMOZ data to my website for a few years now, and never had a problem. A few days ago, however I noticed that the category pages were producing the following error:

Couldn't connect to dmoz.org:80 : IO::Socket::INET: Timeout

At first I thought it was my web host's firewall blocking the connection to DMOZ, so I contacted them and they allowed the connection in the firewall, but I still got the same error. Finally my web host investigated the problem and said that it was DMOZ blocking my IP. (By the way, I uploaded the script to another web server and it works fine.)

I'm pretty sure I'm not breaking any rules or anything for DMOZ to block my IP (and if I am doing something wrong, I will gladly correct it). I would like to know how to comminicate with DMOZ to ask them to unblock my IP.

If anyone can help, I would be very grateful.
 

windharp

Meta/kMeta
Curlie Meta
Joined
Apr 30, 2002
Messages
9,204
You didn't tell us which one of the many ODP users your site is, so I can just give general advice.

- Make sure you fulfill the attribution requirements listed on our site.
- Make sure your script does not generate lots of unneeded CGI calls on our servers
- Make sure that spiders grabbing your site do not run into CGI scripts that redirect to the ODP server. Check http://dmoz.org/robots.txt for URLs you should exclude robots from on your implementation. Spiders calling CGI scripts made our server nearly inaccessible lately.
- If your site has a considerable amount of traffic, please switch to downloading the RDF and serving your users from your own server.
 

pfasia

Member
Joined
Sep 2, 2005
Messages
6
I Have Made the Corrections You Suggested...

Many thanks for your quick reply. My website address is in my profile. I didn't post it in the forum as most forums don't like that.

I have made the following corrections as per your suggestions.

I placed the attribution requirements on my pages that are retrieved from DMOZ. I recently made some changes to the template files and forgot to include the copyright statement. But I've fixed that, now.

I think I was getting a lot of hits from search engines spidering the cgi script that retrieves the pages from DMOZ, so I have included the following robots.txt in my root directory:

User-agent: *
Disallow: /cgi-bin/odp/
Disallow: /editors/
Disallow: /World/.m

Apart from that, I don't think my website generates too much traffic to be a problem, although I am interested in downloading and using the RDF files if you have any suggestions on good scripts etc to use.

The big question for me now is: Is there someone I need to contact to ask to unblock my website from accessing the ODP data?

Many thanks.
 

giz

Member
Joined
May 26, 2002
Messages
3,112
Does the "add" link and the "editor" link go to an actual ODP category, or have you missed out putting a real category path there (it will say "$cat" or something if you forgot that action)?
 

pfasia

Member
Joined
Sep 2, 2005
Messages
6
Yes, the "add" link and the "editor" link go to the proper ODP category.

Every category on my website that is retrieved from DMOZ contains the proper attributes and links to the correct places on DMOZ.

Pages which are not retrieved from DMOZ do not contain any DMOZ attributes. These are pages which I have written myself.

At the moment you can't see that because my website is still being blocked from using DMOZ data. The only categories you can see right now are the ones which I wrote myself and do not need to retrieve data from DMOZ.
 

AgentEccks

Member
Joined
Sep 22, 2005
Messages
2
Same Thing Here...

I'm not sure what the problem is, but I HAVE added the specifications to the robots.txt file.

As for the attribution, I am using the phpODP project, which does have the proper attribution, included.

I have been using this script for a few years now, and all the sudden, it's not working?

How do I get it unblocked?
 

ishtar

Member
Joined
Oct 2, 2002
Messages
688
Maybe staff has finally blocked all those scraper scripts. One can only hope.
 
This site has been archived and is no longer accepting new content.
Top