Crawler

eyecon · Mar 28, 2006

I would think that DMOZ has considerable bandwidth to work with. I'm curious why there are so many redirects and dead links that are not detected. The crawler cannot detect certain types but it should identify most of the redirects and, certainly, the dead links. For example:

[dch@traif ~]# wget --spider --force-html http://list-business.com/opt-in-list-systems
--23:44:15-- http://list-business.com/opt-in-list-systems
=> `opt-in-list-systems'
Resolving list-business.com... 216.235.79.13
Connecting to list-business.com|216.235.79.13|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://emailuniverse.com/ [following]
--23:44:15-- http://emailuniverse.com/
=> `index.html'
Resolving emailuniverse.com... 216.235.79.13
Connecting to emailuniverse.com|216.235.79.13|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
200 OK

sfromis · Mar 28, 2006

Many redirects are ok, as we do not want to list subpage-of-the-day which the base domain redirects to. However, improved QC efforts mean that you should find the count of outdated URLs going down.

jeanmanco · Mar 28, 2006

Our link checker does not run constantly. If you take a look at http://research.dmoz.org/publish/chris2001/odp_reports/report_2005.htm
you will see that it only ran twice last year. There are other QC efforts going on in between Robozilla runs, but it is an enormous task.

sfromis · Mar 30, 2006

Note that the report also says "the other tools are used in shorter periods or continuously". Link checking is much more than just Robozilla

eyecon · Mar 30, 2006

sfromis said:
Note that the report also says "the other tools are used in shorter periods or continuously". Link checking is much more than just Robozilla

That MIGHT necessitate some robots.txt changes. What is the user agent string?

sfromis · Mar 30, 2006

That question cannot be answered.

Crawler

eyecon

Member

sfromis

Member

jeanmanco

Member

sfromis

Member

eyecon

Member

sfromis

Member