eyecon Posted March 28, 2006 Posted March 28, 2006 I would think that DMOZ has considerable bandwidth to work with. I'm curious why there are so many redirects and dead links that are not detected. The crawler cannot detect certain types but it should identify most of the redirects and, certainly, the dead links. For example: [dch@traif ~]# wget --spider --force-html http://list-business.com/opt-in-list-systems --23:44:15-- http://list-business.com/opt-in-list-systems => `opt-in-list-systems' Resolving list-business.com... 216.235.79.13 Connecting to list-business.com|216.235.79.13|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://emailuniverse.com/ [following] --23:44:15-- http://emailuniverse.com/ => `index.html' Resolving emailuniverse.com... 216.235.79.13 Connecting to emailuniverse.com|216.235.79.13|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] 200 OK
sfromis Posted March 28, 2006 Posted March 28, 2006 Many redirects are ok, as we do not want to list subpage-of-the-day which the base domain redirects to. However, improved QC efforts mean that you should find the count of outdated URLs going down.
jeanmanco Posted March 29, 2006 Posted March 29, 2006 Our link checker does not run constantly. If you take a look at http://research.dmoz.org/publish/chris2001/odp_reports/report_2005.htm you will see that it only ran twice last year. There are other QC efforts going on in between Robozilla runs, but it is an enormous task.
sfromis Posted March 30, 2006 Posted March 30, 2006 Note that the report also says "the other tools are used in shorter periods or continuously". Link checking is much more than just Robozilla
eyecon Posted March 30, 2006 Author Posted March 30, 2006 Note that the report also says "the other tools are used in shorter periods or continuously". Link checking is much more than just Robozilla That MIGHT necessitate some robots.txt changes. What is the user agent string?
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now