Jump to content

Recommended Posts

Posted

I would think that DMOZ has considerable bandwidth to work with. I'm curious why there are so many redirects and dead links that are not detected. The crawler cannot detect certain types but it should identify most of the redirects and, certainly, the dead links. For example:

 

[dch@traif ~]# wget --spider --force-html

--23:44:15--

=> `opt-in-list-systems'

Resolving list-business.com... 216.235.79.13

Connecting to list-business.com|216.235.79.13|:80... connected.

HTTP request sent, awaiting response... 302 Found

Location:
[following]

--23:44:15--

=> `index.html'

Resolving emailuniverse.com... 216.235.79.13

Connecting to emailuniverse.com|216.235.79.13|:80... connected.

HTTP request sent, awaiting response... 200 OK

Length: unspecified [text/html]

200 OK

Posted
Many redirects are ok, as we do not want to list subpage-of-the-day which the base domain redirects to. However, improved QC efforts mean that you should find the count of outdated URLs going down. :)
Posted
Note that the report also says "the other tools are used in shorter periods or continuously". Link checking is much more than just Robozilla :)

 

That MIGHT necessitate some robots.txt changes. What is the user agent string?

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...