Don't know where else to put this. I have just trawled the 4+ million odd links in DMOZ over the weekend. About 7.5 % result in non 200 OK response codes. The actual urls in each error cat are available at http://devel3.smartsurf.org/dmoz/ if this is of any use to anyone.
I don't know if there is any form of QA but pulling the 404 list, reconciling the cat to an editor and emailing them a note that links foo bar baz are 404 might be worthwhile.
BTW are any god level editors interested in a list of DMOZ sites that are not in Top/Adult that have a statistical probability of > 99.5% sensitivity/specificity of currently rendering porn?
I don't know if there is any form of QA but pulling the 404 list, reconciling the cat to an editor and emailing them a note that links foo bar baz are 404 might be worthwhile.
BTW are any god level editors interested in a list of DMOZ sites that are not in Top/Adult that have a statistical probability of > 99.5% sensitivity/specificity of currently rendering porn?
Code:
[root@devel3 dmoz]# wc dmoz-*
1 1 35 dmoz-10
1 1 32 dmoz-100
1 1 21 dmoz-201
1 1 26 dmoz-202
6 6 218 dmoz-204
4 4 87 dmoz-205
1 1 18 dmoz-299
859 859 38420 dmoz-401
16 16 520 dmoz-402
7656 7656 257221 dmoz-403
123068 123068 7049452 dmoz-404
1 1 23 dmoz-406
3 3 102 dmoz-407
1 1 25 dmoz-408
327 327 10717 dmoz-410
18602 18602 1091169 dmoz-414
1 1 83 dmoz-415
3 3 103 dmoz-418
4 4 135 dmoz-419
1 1 24 dmoz-420
40 40 1588 dmoz-423
7 7 178 dmoz-449
109927 109929 4064789 dmoz-500
84 84 2932 dmoz-501
103 103 5403 dmoz-502
873 873 38363 dmoz-503
1 1 25 dmoz-5030
27718 27718 1059829 dmoz-504
2 2 48 dmoz-507
247 247 10874 dmoz-508
15891 15891 721798 dmoz-510
1 1 34 dmoz-550
20 20 997 dmoz-999
305471 305473 14355289 total
[root@devel3 dmoz]#