Here's the way the ODP improvement process works. Editors, based on their experience with identified quality problems, get (or borrow an idea). Then they go implement it as a pilot project, and show other editors the result. Then, if the community thinks the result is worth two diddleys or a squat, it's placed in "editor tools". And if it becomes widely enough used, or often enough enhanced, then it is incorporated into the dmoz.org process itself. Occasionally, VERY occasionally, something becomes a mandatory part of the process.
There's no reason why editing experience has to be on the ODP itself. Obviously knowing what KIND of spam we face, is pretty important. But the fact is, the same jerks who generate the drivel-made-for-adsense doorway pages that clutter up Google search results, submit the same kinds of pages to us -- so, really, anyone can get a very good idea of the spam we face by just looking at the top 200 Google search results, and discarding the informational sites (whether that's two sites discarded or two dozen, you'll still have lots good examples of online spam.)
Anyone will, for instance, quickly notice that 100% of all successful spammers (and 99.9999% of the unsuccessful ones) are VERY good at filling their home page with keywords both relevant and irrelevant. (That fact alone should tell you that your idea for a filter is at best totally worthless. But is it that good? No, not really, it's worse. Because there are some high-information-content websites whose webmasters DON'T fill the main page with keywords whether relevant or irrelevant. So based on those facts, this filter, as proposed, is not only guaranteed to let all conceivable spam through, it is also guaranteed to block some of the most authoritative, most informative sites on the web.
I've looked at enough websites and enough site suggestions that I'm pretty confident of this analysis.
But if you actually implement this on your site, we'd be happy to hear an analysis of the first 100 suggestions rejected by the filter. But note: probably it would be even more harmful for the ODP than for your site -- after all, how many sites of the quality of "Smithsonian" or "BBC Special Reports", or even of the quality of "graduate student John Doe's extraterrestrial entomology page" are being submitted to your directory? I suspect just about all you get is madfads ("more anonymous drivel made for adsense"). But ... if you think I'm wrong here, you can test it. Go through all your suggestions for a week, and see if you can find two or three genuine authority suggestions that aren't already listed in the ODP. If you have lots of candidates, your quality standards are probably WAY below ours, but pick the best two or three, and post them here.
We have about 600,000 categories. We'd also be interested in knowing how long it takes you to write out the, say, 100,000 keywords for, say, the first 1000 categories; and what automated processes you found useful for that.
Well, I'll stop here. I think I've probably put way more thought into the ideas than you did, but if you still think them worthwhile, I've given you several things to work on. If your money isn't where your mouth is, it's obvious nobody else should take the ideas seriously. If your money IS there, then, succeed or fail we'd like to hear the results in a month or three -- or a year or three, we're all on volunteer time here!
There's no reason why editing experience has to be on the ODP itself. Obviously knowing what KIND of spam we face, is pretty important. But the fact is, the same jerks who generate the drivel-made-for-adsense doorway pages that clutter up Google search results, submit the same kinds of pages to us -- so, really, anyone can get a very good idea of the spam we face by just looking at the top 200 Google search results, and discarding the informational sites (whether that's two sites discarded or two dozen, you'll still have lots good examples of online spam.)
Anyone will, for instance, quickly notice that 100% of all successful spammers (and 99.9999% of the unsuccessful ones) are VERY good at filling their home page with keywords both relevant and irrelevant. (That fact alone should tell you that your idea for a filter is at best totally worthless. But is it that good? No, not really, it's worse. Because there are some high-information-content websites whose webmasters DON'T fill the main page with keywords whether relevant or irrelevant. So based on those facts, this filter, as proposed, is not only guaranteed to let all conceivable spam through, it is also guaranteed to block some of the most authoritative, most informative sites on the web.
I've looked at enough websites and enough site suggestions that I'm pretty confident of this analysis.
But if you actually implement this on your site, we'd be happy to hear an analysis of the first 100 suggestions rejected by the filter. But note: probably it would be even more harmful for the ODP than for your site -- after all, how many sites of the quality of "Smithsonian" or "BBC Special Reports", or even of the quality of "graduate student John Doe's extraterrestrial entomology page" are being submitted to your directory? I suspect just about all you get is madfads ("more anonymous drivel made for adsense"). But ... if you think I'm wrong here, you can test it. Go through all your suggestions for a week, and see if you can find two or three genuine authority suggestions that aren't already listed in the ODP. If you have lots of candidates, your quality standards are probably WAY below ours, but pick the best two or three, and post them here.
We have about 600,000 categories. We'd also be interested in knowing how long it takes you to write out the, say, 100,000 keywords for, say, the first 1000 categories; and what automated processes you found useful for that.
Well, I'll stop here. I think I've probably put way more thought into the ideas than you did, but if you still think them worthwhile, I've given you several things to work on. If your money isn't where your mouth is, it's obvious nobody else should take the ideas seriously. If your money IS there, then, succeed or fail we'd like to hear the results in a month or three -- or a year or three, we're all on volunteer time here!