J
JOE3656
Human-only editing has reached it's limit.
Many submitters vent frustration at the dmoz, not realizing editing continues to follow the same algorithmic laws that limit all computer programs based on trees. Human editing of the web is a losing proposition, even if based on the following highly dmoz favorable assumptions. Organization or reorganization of just 500,000 sites takes at least millions of node visits (following an nlog(n) vistation boundary (not exactly right (it's similar to a b*-tree) but it's close enough). A subcat with 4000 sites requires a least 12000 visits to remain somewhat organized (<1 year out of date).
Assume that 5% of the websites become dead links every year. Assume also that 4 minutes is taken per site per year to manage or add each site. 500,000 sites = 2 million minutes and as the average depth of the tree reachs a fourth or fifth level and the tree widens, this 3-4 minute time gets longer. (25000 hours per year/ minimum.) Reach a million sites and the demand for time more than doubles. Reach the 4 million sites currently listed and you get a demand time of 266,000 hours required for even yearly edits just to eliminate and replace 200,000 dead sites. Growth is not even mentioned.
Will 65000 editors put the 4 hours per year just to keep to the current level? Not likely. Is their choice always the best and most interesting sites? No, it's not in the design of dmoz.
A category like Recreational Boating (dmoz contains only 4000 sites) can barely keep up with the 10's of thousands of useful sites, (yes, I am aware of the submission process). (Example: There are 1000+ yacht clubs and 1000 marinas in the US, almost all of whom have web sites of interest, so the category should have far more than the 4000 sites listed.) A promised reorg had to fail to be completed by the boating site editor at his stated schedule, because he would have needed to devote more than 6 weeks of dedicated effort to reorg the sites by region. Editors have families, and commitments.
The dmoz approach of human-only editing is algorithmically naive, and needs to be reexamined. The dmoz must use google-like bayesian categorization tools, as a primary selector, with suggestions and exception processing being provided by human intervention. Any other approach is a sure loser, a pyramid scheme. By the way, it's not a lack of understanding about the dmoz that prompts me to write this, so re-explaining the dmoz won't change my mind.
So what is dmoz really doing? Not really much, it's not steering users to pay sites like google does, "sponsored", but it's not keeping up either.
Dmoz approaches must be realistically based on algorithmic laws and that means the use of automated methods, (assuming it wishes to continue with any level of success). Editors must have tools to search the internic databases and to select useful sites.
By success, I mean being the first choice of end users for finding the open and free information they seek without hidden steering to "sponsors".
Thanks
Many submitters vent frustration at the dmoz, not realizing editing continues to follow the same algorithmic laws that limit all computer programs based on trees. Human editing of the web is a losing proposition, even if based on the following highly dmoz favorable assumptions. Organization or reorganization of just 500,000 sites takes at least millions of node visits (following an nlog(n) vistation boundary (not exactly right (it's similar to a b*-tree) but it's close enough). A subcat with 4000 sites requires a least 12000 visits to remain somewhat organized (<1 year out of date).
Assume that 5% of the websites become dead links every year. Assume also that 4 minutes is taken per site per year to manage or add each site. 500,000 sites = 2 million minutes and as the average depth of the tree reachs a fourth or fifth level and the tree widens, this 3-4 minute time gets longer. (25000 hours per year/ minimum.) Reach a million sites and the demand for time more than doubles. Reach the 4 million sites currently listed and you get a demand time of 266,000 hours required for even yearly edits just to eliminate and replace 200,000 dead sites. Growth is not even mentioned.
Will 65000 editors put the 4 hours per year just to keep to the current level? Not likely. Is their choice always the best and most interesting sites? No, it's not in the design of dmoz.
A category like Recreational Boating (dmoz contains only 4000 sites) can barely keep up with the 10's of thousands of useful sites, (yes, I am aware of the submission process). (Example: There are 1000+ yacht clubs and 1000 marinas in the US, almost all of whom have web sites of interest, so the category should have far more than the 4000 sites listed.) A promised reorg had to fail to be completed by the boating site editor at his stated schedule, because he would have needed to devote more than 6 weeks of dedicated effort to reorg the sites by region. Editors have families, and commitments.
The dmoz approach of human-only editing is algorithmically naive, and needs to be reexamined. The dmoz must use google-like bayesian categorization tools, as a primary selector, with suggestions and exception processing being provided by human intervention. Any other approach is a sure loser, a pyramid scheme. By the way, it's not a lack of understanding about the dmoz that prompts me to write this, so re-explaining the dmoz won't change my mind.
So what is dmoz really doing? Not really much, it's not steering users to pay sites like google does, "sponsored", but it's not keeping up either.
Dmoz approaches must be realistically based on algorithmic laws and that means the use of automated methods, (assuming it wishes to continue with any level of success). Editors must have tools to search the internic databases and to select useful sites.
By success, I mean being the first choice of end users for finding the open and free information they seek without hidden steering to "sponsors".
Thanks