Guest JOE3656 Posted January 27, 2004 Posted January 27, 2004 Re: Reorganization and why dmoz human editing fail I don't think I answered spectre's question fully, so about the government, corporations, and funding policy. It's not unusual for groups like corporations, or governments to develop software architectures as freeware. You may know that DARPA funded the Internet development (as ARPANET) and TCP/IP, and much of supercomputing, AI Research. Corporations and universities develop technologies for the overall market, and TelComms have developed much of the standards and software we use as Freeware AT&T. Dmoz itself is Netscape supported. Who funded the RFC's like 1521, and implemented the standards, who funds ECMA or W3C? (The W3C donor page is very telling. Why do they fund? Because advancing the state of the art and the technologies that everyone uses pays off for them in the long run, and being involved early provides a significant competitive edge. Consider: Development of more powerful dmoz software could lead to improvements in Active Directory (Benefitting Microsoft and MS users), X400/X500 and SMTP (Benefitting Telcoms and email users), UDDI (Benefitting Press agencies and content providers), or LDAP. While any of this is speculatively possible, it's the research that makes the difference. The sourceforge is also a possible development approach, but funding to a certain conclusion is an issue with sourceforge efforts. For a sustained effort, many orgs are funded from a number of groups, with the contributions from donor's less monopsic in nature. (Actually I think dmoz is the only org I know of that has a single resource donor.) I expect to be 3 months from a mini demo, will post again soon. Still waiting to see to what level a site can be trusted to provide a description to dmoz. (It's a significant issue). Best.
Alucard Posted January 27, 2004 Posted January 27, 2004 Re: Reorganization and why dmoz human editing fail OK, to address the issue of which sites can be trusted to provide a description which can just be published without editor intervention. Based on the number of edits I have done, I would say roughly 0.01% of the submission that I see - and those are not from anyone I would have expected to provide something which fits the guidelines. All it meant was that the submitter took the trouble to read the ODP guidelines about titles and descriptions and managed to interpret them correctly (no mean feat) and then submitted a listing. But other editors' experiences may be different....
Guest JOE3656 Posted January 27, 2004 Posted January 27, 2004 Re: Reorganization and why dmoz human editing fail Your answer impells me to ask. Do you have sites that you might trust after a first submission (and factors that make any subsequent submissions more or less trustworthy) ? (It's very germaine to an approach). Factors that might apply are 1) if the submitter has correctly supplied the information for 1 or more sites, or 2) the category or 3) URL is more or less trustworthy or 4)anything else. Could you set factors for a category with some certainty? Thanks for the reply, that's the stuff I need.
Editall/Catmv lissa Posted January 27, 2004 Editall/Catmv Posted January 27, 2004 Re: Reorganization and why dmoz human editing fail Still waiting to see to what level a site can be trusted to provide a description to dmoz. (It's a significant issue). Unfortunately, they can't. Take a look at meta-tags on websites to see the kinds of stuff we get. Once in a while we get a description submitted by someone who obviously took the time to look at the guidelines and what was already listed, and who could write, spell, and use proper grammar in the appropriate language of the category. A fair amount of descriptions are somewhat reasonably written (basic understandable sentence structure) although they don't conform to our guidelines and need rewriting. But at least 50% of what I review is pure cr*p - lists of keywords, keyword stuffed titles, unintellible phrases, marketroid-speak that contains no informational content, etc. Now, I would imagine that there may be some AI linguistic analysis that could be applied to submitted titles and descriptions to at least make sure they are sentences, and it might be possible to analyse the actual content of a website and compare it to the description submitted to screen it some. However, in the long run, an actual editor needs to look at the site and at least verify the title and description. In a perfect world where everyone is altruistic and spends the time and energy to try to do things right, accepting submitted descriptions might work. Unfortunately in reality, there are far too many people only interested in taking advantage of other people's efforts for that to have a chance. I think it is an interesting approach to go to a large agency with a proposal. I had figured the only way new software for running a large directory would get developed would be through the open source movement. I'm looking forward to your mini demo.
bobrat Posted January 27, 2004 Posted January 27, 2004 Re: Reorganization and why dmoz human editing fail Of the several thousand edits that I have done, I've had a few that were close to being correct, that is they needed maybe a one word change. Probably less than a 100. I think I had two that were accepted as is. I've a had a significant number where the submitter could not spell his web site name correctly, or his company name, or the primary service that his site provides. For example, I have one right now from a major university that misspelled their web site name, that is followed by another that is fairly good, except they can't spell, and is slightly too much hype. And another where they fail to capitalize the start of a sentence, and for some reason capitalize two words in the middle of a sentence. A very large number of submissions have a lot of the right words for the description, but think that a description is a title, and capitalize every word. So I have to go through retyping half the letters and cut and pasting to fix it up.
Editall/Catmv lissa Posted January 27, 2004 Editall/Catmv Posted January 27, 2004 Re: Reorganization and why dmoz human editing fail A couple posts snuck in while I was typing. One method that might give a good first start for a description is to analyse the meta tag on a site. If it is a full sentence and not either a string of keywords or a paragraph, it could form the basis for the first part of a good description (what does the business or organization do/ what is the site about). The second thing we include in descriptions is a summary of the site contents. This is usually reflected by the site's main mavigation to some extent, although we try to dig out really unique stuff that may be several layers down. Different areas have different styles of descriptions. To some extent the logic for creating a description is based on the catgory it is in. For example, the Shopping/ branch by definition is for sites that offer products that can be obtained without going to a store. All the sites have to contain products, prices, and method for ordering. Because this is true for all sites, we don't include it in any of the descriptions. Since most sites only include that information, most descriptions only include the products offered. In other categories it may be that the category is so narrow that it isn't necessary to describe what the business does at all, and instead the description only contains site contents. As far as trusting submissions go, it might be possible to set up a method for creating an "approved submitter" list. We've actually discussed the concept before, however it doesn't work well with our current culture and software. Really what we would be talking about is creating a different kind of editing permission. Right now, an editor with permissions in a category can add, delete, and modify any listing in the reviewed (listed) and unreviewed (waiting) areas. What I think the new permissions could do would be to allow only adding listings to a category. It might be possible to set up some way for people to apply or train for this permission, and then get the permission across a broad area of interest, or maybe for categories at least X levels deep. Enough rambling. I'm not sure any of that made sense, but hopefully it will stir ideas.
Alucard Posted January 27, 2004 Posted January 27, 2004 Re: Reorganization and why dmoz human editing fail Do you have sites that you might trust after a first submission (and factors that make any subsequent submissions more or less trustworthy) ? (It's very germaine to an approach). Most sites, once submitted and listed once, never ever submit again. If they do, then it's usually not listable. So I don't think that is an approach. Factors that might apply are 1) if the submitter has correctly supplied the information for 1 or more sites, or 2) the category or 3) URL is more or less trustworthy or 4)anything else. Could you set factors for a category with some certainty? No, I don't believe that we can, based onmy experience. Right now you'd track a submitter using an email address. Those are wayyy too easy to spoof - when word got out that fred@mysite.ord.uk was a trusted submitter, EVERYONE would submit under that email address. If you're going to have a login for a submitter, well, then they may as well be an editor, right? (We have a type of editor called a Greenbuster who is someone who can essentialy add sites, and review unreviewed that has to be approved by an editor with full privs on the category.)
Meta hutcheson Posted January 27, 2004 Meta Posted January 27, 2004 Re: Reorganization and why dmoz human editing fail Most sites that follow the guidelines, submit once. If they never submit again, then we could have trusted them. This seems pretty straightforward logic, but I don't see how it advances whatever scheme you have in mind. The few (and I DO mean FEW: less than one in one HUNDRED THOUSAND sites) sites that have been trusted are called "PCP's". IIRC, precisely ONE of them was a single-person effort. A significant minority of the original PCPs are now seen as sites-that-should-not-have-been-trusted. I don't know where you're trying to go with this, but I can pretty well know you're going there without us. As for "trusted submitters", we call them "editors."
spectregunner Posted January 27, 2004 Posted January 27, 2004 Re: Reorganization and why dmoz human editing fail Might I suggest that tryong to get to thepoint where and AI can pre-identify a trusted submitter might be the wrong approach. What might be more useful would be developing a a tunable AI (or series of AIs since different sections of the directory have different guidelines) that could score a submission. Some of the best value would be in pre-identifying the spam, duplicate submissions and garbage, so that the human editors can be more efficient. The ability to look at a pool of unreviewed sites, have them arranged according to score has tremendous appeal to me. Especially if the AI got good enough that I know in, say, Shopping, I could quickly kill of anything with a score below n. I'm not sure I'd ever completly trust an AI to add a site in a cat where I edit, but I'd love to rely on one to help me spped through the dross.
bobrat Posted January 27, 2004 Posted January 27, 2004 Re: Reorganization and why dmoz human editing fail I really like that idea, I end up doing that manually to some extent, e.g. bad descriptions cause me to drop the site back in unreviewed, whereas as pretty good ones, me me want to make some minor changes and list them. If that could be semi-automated that would be nice.
Editall/Catmv lissa Posted January 28, 2004 Editall/Catmv Posted January 28, 2004 Re: Reorganization and why dmoz human editing fail score a submission Great idea! Criteria that could be considered: - title and description spelling, grammar, capitalizations - format and style compared to existing listings in the category - title and meta tags on the site itself - based on description, likelihood of belonging in the submitted category - keyword/marketing hype analysis It would definitely help prioritize things.
lisahinely Posted January 28, 2004 Posted January 28, 2004 Re: Reorganization and why dmoz human editing fail If you're going for a heuristic (and I sure wouldn't recomend the Paul Revere algorithmic AI approach for anything), I think it would be helpful to offer suggestions for where (geographically) the site goes. That is, cull any addresses on the site, if they're incomplete compare with other geographic clues, etc.
spectregunner Posted January 28, 2004 Posted January 28, 2004 Re: Reorganization and why dmoz human editing fail Along with that, have it parse the site and generate an intelligent list of the the links and redirects off of the first level of pages. Something that says: www.foo.com - 4 links www.bar.com - 2 redirects etc. That would save us from having to click all the links and look at all the source code, and would give us a better idea as to whether the site is real or a redirecting mirror of some sorts. Similarly, if the ai would also look up the registration and give us the ownership. Thus saving us many, many steps.
bobrat Posted January 28, 2004 Posted January 28, 2004 Re: Reorganization and why dmoz human editing fail Also could do with a Language scan that could point out non-English and multi-lingual sites. I get an excessive number of non-English sites submitted with an English description and I like to weed those out sooner than later, so they can be sent to World, rather than sitting around.
spectregunner Posted January 28, 2004 Posted January 28, 2004 Re: Reorganization and why dmoz human editing fail I see that this discussion is taking a significant turn in direction, and I think it is one for the better. We are now discussing how to use technology to make the existing (and future) human editors better and more efficient, which is something that most editors will enthusiastically support -- rather than going down the path of most resistance -- trying to discuss how to replace editors.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now