Reorganization and why dmoz human editing fails

K

katapult

Re: Reorganization and why dmoz human editing fail

Okay... I'll have a stab - I can't stand the current design but I'll keep any offerings in a similar mold (i.e. I guess it has to be green) and fast loading as requested.

I'll post it in a new thread - thanks for the replies.

:: Katapult ::
 

donaldb

Member
Joined
Mar 25, 2002
Messages
5,146
Re: Reorganization and why dmoz human editing fail

I was kind of being sarcastic :) I can't imagine how anyone could come up with anything less graphic intense than our current site. We only have 3 IMG tags on the whole index page. Anything less and the mozilla guy will have to be done in ASCII text {moz}

The ODP really isn't about the design of the site. We're not trying to sell anything or entice people to come to the site. Most of our work is being done for the downstream users who take our RDF files and the few people who actually come to the site to find info. The heaviest users that we have are probably the editors who are buzzing around from category to category. That's the reason that the site's design is clean, simple, and usable. We want people to be able to navigate quickly to the information that they are looking for. We don't need them to have a pretty interface to do that. Build for your audience, not for the designer.

I think we all would agree that the design of the site is not the most aesthetically pleasing, but for the most part it does work well :)
 

hutcheson

Curlie Meta
Joined
Mar 23, 2002
Messages
19,136
Re: Reorganization and why dmoz human editing fail

Yes, this is DEFINITELY an example of "if it ain't broke, don't fix it." And the pages WORK. They VALIDATE -- closer to standard HTML than any other portal site. The current project is to go fully Unicode, which will make our foreign-language categories WORK better. LOOK good or bad, frankly, is not a concern: it looks good enough for our purposes.
 

jswafford

Member
Joined
Aug 7, 2003
Messages
668
Re: Reorganization and why dmoz human editing fail

Google isn't exactly all that cute either. {moz}
 
J

JOE3656

Re: Reorganization and why dmoz human editing fail

Status: Still considering
Questions
- Would the editor community be adverse to UDDI feeds from defined or trusted organizations (still managed by dmoz editors).(Examples US Dept of Education list of accredited schools contains web addresses.) ?

I define a feed is a set of site (and URL data) from a trusted source that has an editors permission to apply that feed to fill or part fill a category.

I am currently reviewing LDAP and (UDDI) technologies as a enriched basis for a test.

Can a web site be trusted to attribute itself? I know keyword abuses in the past have left many bed impressions.

Finally, I wish to note that I am following these directions in approach
1) Develop an enriched directory structure that is more efficient.
2) Define an integrated environment and tool set that supports the dmoz structure above in a System Engineering friendly architecture (a Plug-in IEE).
3) Review AI and bot technologies for a editor enhancing capability that reduces the efforts to maintain a larger dmoz.
4) At a level of maturity seek funding, or WP the project to a DARPA-like group. (Yeah, they do this stuff all the time).
5) Develop the structure if possible using LDAP and UDDI standardized and free solutions.
6) Develop a test site - prototype of the structure.


While the UI (User Interface) questions may be valid, they do not help solve the HE (Human Editing and data gathering) issues or a improve back end functionality. They should be threaded elsewhere.

----------------------------------------------------
Brain Teaser for tree search
Tom claims to be descended from Paul Revere (who incidentally had 16 children). Should you verify this from Paul down to Tom (forward search) or from Tom back to Paul Revere? Which is faster for searching (forward - determining if tom is decendant of Paul or back searching determining if Paul is an ancestor of Tom) for this problem? Why?

Answer: posted next time.
 

lisahinely

Member
Joined
Jul 30, 2003
Messages
246
Re: Reorganization and why dmoz human editing fail

If you're serious and realize knowledge mining is nontrivial, don't neglect Cyc (http://www.cyc.com/).
 

hutcheson

Curlie Meta
Joined
Mar 23, 2002
Messages
19,136
Re: Reorganization and why dmoz human editing fail

In the past, the ODP staff has considered semi-automatic feeds. As standards rose, fewer and fewer sites have been deemed eligible. In fact, it's probably been three years since any were accepted.

On the other hand, there are some that almost surely COULD still be considered eligible. (e.g., Project Gutenberg.)

You'd have to take that up with staff.
 

spectregunner

Member
Joined
Jan 23, 2003
Messages
8,768
Re: Reorganization and why dmoz human editing fail

4) At a level of maturity seek funding, or WP the project to a DARPA-like group. (Yeah, they do this stuff all the time).

The idea of trying to turn ODP over to (or get funding from) any branch of the US government would probably be fundamentally offensive to a great part of the World community (and a not insignificant portion of the US Community). Add Darpa to the mix and the reaction could be unpredictable.

I would be/am emotionally opposed to this entire concept as it would violate the social contract and breach the agreement that I made with the Project (and it made with me)when I became an editor. Who is to say that some unnamed/undetermined funding source is going to have the same social commitment that is in place to day? If they do not, what happens, do they drive off all the current editors and replace them with sock-puppets of their own choosing?

One thing that you have failed to measure in all of this is the commitment that the core of active editors have. They don't this for money, or glory, or a competitive edge. There are no "industry forums" or "advisory councils" bending the editorial direction one way or the other. No IEEE 123.456 committee decides things. A change of parties in the Congress or the White House has no impact in the day-to-day editing or direction of the project.

Politicians have no impact on us, the media has no impact on us, business does not control or influence us, religion (organized or otherwise) does not change our compass. This is, frankly what disturbs me about this entire thread. I'm somewhat certain that you are well meaning, but you are not an editor, you have never been an editor, yet as an outsider you are proposing wholesale changes in the work product of the editorial community. Why? Why do you tilt at this particular windmill?

Why do I have this sinking feeling that this is all because the Boating reorganization is not going fast enough to suit you? Please correct me if I am wrong.
 

lissa

Member
Joined
Mar 25, 2002
Messages
918
Re: Reorganization and why dmoz human editing fail

The idea of trying to turn ODP over to (or get funding from) any branch of the US government

:confused: Huh? I didn't read that into his process outline at all. If someone wants to invest time and effort into investigating how to build a better directory, there are plenty of resources available to request research money from, DARPA included. If ODP were eventually to use the final product, this doesn't mean that somehow ODP has been turned over to the government.

I think it's been made pretty clear that the only way the ODP would consider shifting to a totally new software would be if it were provided complete and free, no strings attached. I don't see any harm in brainstorming ideas to help make this happen. You never know, maybe well get new digs in a couple years. :cool:
 
J

JOE3656

Re: Reorganization and why dmoz human editing fail

Reasonable enough to ask, but no, this is an area in which I have some professional knowledge. The approach taken gave me serious doubt about the reorg being completable in the 6 weeks, and the answers from the dmoz editors led to this approach.


-------- Answer to the Poser -----
The 16 children of Paul Revere
Tom claims to be a descendent of Paul Revere. Which would be the easier way to verify Tom’s claims: by showing that Revere is one of Tom’s ancestors (backward search) or by showing that Tom is one of Revere’s descendents (forward search)?

This is a graph search problem with each person is a node linked to his/her parents and to all his/her children. I’ll assume about 12 generations at about 20 years each.
Generally this choice depends on the branching factor of the graph, a person may only be born to 2 parents, so backward searches have a branching factor of 2 or 2^12 = 4096 nodes for backward searching.
Assuming even an average family size of 3 children per married couple, forward search is not recommended, since for 3 children we have 3^12 or 531,441 nodes to traverse.

BTW : Old Paul Revere is an exceptional example of fatherhood. The number of children on average over the 12 generations is a factor and the known historical fecundity of the Paul Revere clan (who himself had 16 children, was born eldest of 12 and wore out 2 wives) makes forward searching of the graph expensive. If we assume that 14 was an average Revere family size, we get a whopping 14^12 or 56,693,912,375,296 possible nodes to search, which is far larger than the past and present human population of the planet Earth.

This is a classic AI tree problem
 
J

JOE3656

Re: Reorganization and why dmoz human editing fail

I don't think I answered spectre's question fully, so
about the government, corporations, and funding policy.

It's not unusual for groups like corporations, or governments to develop software architectures as freeware. You may know that DARPA funded the Internet development (as ARPANET) and TCP/IP, and much of supercomputing, AI Research. Corporations and universities develop technologies for the overall market, and TelComms have developed much of the standards and software we use as Freeware AT&T. Dmoz itself is Netscape supported. Who funded the RFC's like 1521, and implemented the standards, who funds ECMA or W3C? (The W3C donor page is very telling.

Why do they fund? Because advancing the state of the art and the technologies that everyone uses pays off for them in the long run, and being involved early provides a significant competitive edge.

Consider: Development of more powerful dmoz software could lead to improvements in Active Directory (Benefitting Microsoft and MS users), X400/X500 and SMTP (Benefitting Telcoms and email users), UDDI (Benefitting Press agencies and content providers), or LDAP.

While any of this is speculatively possible, it's the research that makes the difference. The sourceforge is also a possible development approach, but funding to a certain conclusion is an issue with sourceforge efforts. For a sustained effort, many orgs are funded from a number of groups, with the contributions from donor's less monopsic in nature. (Actually I think dmoz is the only org I know of that has a single resource donor.)

I expect to be 3 months from a mini demo, will post again soon.

Still waiting to see to what level a site can be trusted to provide a description to dmoz. (It's a significant issue).

Best.
 

Alucard

Member
Joined
Mar 25, 2002
Messages
5,920
Re: Reorganization and why dmoz human editing fail

OK, to address the issue of which sites can be trusted to provide a description which can just be published without editor intervention.

Based on the number of edits I have done, I would say roughly 0.01% of the submission that I see - and those are not from anyone I would have expected to provide something which fits the guidelines. All it meant was that the submitter took the trouble to read the ODP guidelines about titles and descriptions and managed to interpret them correctly (no mean feat) and then submitted a listing.

But other editors' experiences may be different....
 
J

JOE3656

Re: Reorganization and why dmoz human editing fail

Your answer impells me to ask.

Do you have sites that you might trust after a first submission (and factors that make any subsequent submissions more or less trustworthy) ? (It's very germaine to an approach).

Factors that might apply are 1) if the submitter has correctly supplied the information for 1 or more sites, or 2) the category or 3) URL is more or less trustworthy or 4)anything else. Could you set factors for a category with some certainty?

Thanks for the reply, that's the stuff I need.
 

lissa

Member
Joined
Mar 25, 2002
Messages
918
Re: Reorganization and why dmoz human editing fail

Still waiting to see to what level a site can be trusted to provide a description to dmoz. (It's a significant issue).

Unfortunately, they can't. :( Take a look at meta-tags on websites to see the kinds of stuff we get. Once in a while we get a description submitted by someone who obviously took the time to look at the guidelines and what was already listed, and who could write, spell, and use proper grammar in the appropriate language of the category. A fair amount of descriptions are somewhat reasonably written (basic understandable sentence structure) although they don't conform to our guidelines and need rewriting. But at least 50% of what I review is pure cr*p - lists of keywords, keyword stuffed titles, unintellible phrases, marketroid-speak that contains no informational content, etc.

Now, I would imagine that there may be some AI linguistic analysis that could be applied to submitted titles and descriptions to at least make sure they are sentences, and it might be possible to analyse the actual content of a website and compare it to the description submitted to screen it some. However, in the long run, an actual editor needs to look at the site and at least verify the title and description.

In a perfect world where everyone is altruistic and spends the time and energy to try to do things right, accepting submitted descriptions might work. Unfortunately in reality, there are far too many people only interested in taking advantage of other people's efforts for that to have a chance. :crazy:

I think it is an interesting approach to go to a large agency with a proposal. I had figured the only way new software for running a large directory would get developed would be through the open source movement.

I'm looking forward to your mini demo. :cool:
 

bobrat

Member
Joined
Apr 15, 2003
Messages
11,061
Re: Reorganization and why dmoz human editing fail

Of the several thousand edits that I have done, I've had a few that were close to being correct, that is they needed maybe a one word change. Probably less than a 100. I think I had two that were accepted as is. I've a had a significant number where the submitter could not spell his web site name correctly, or his company name, or the primary service that his site provides.

For example, I have one right now from a major university that misspelled their web site name, that is followed by another that is fairly good, except they can't spell, and is slightly too much hype. And another where they fail to capitalize the start of a sentence, and for some reason capitalize two words in the middle of a sentence.

A very large number of submissions have a lot of the right words for the description, but think that a description is a title, and capitalize every word. So I have to go through retyping half the letters and cut and pasting to fix it up.
 

lissa

Member
Joined
Mar 25, 2002
Messages
918
Re: Reorganization and why dmoz human editing fail

A couple posts snuck in while I was typing. ;)

One method that might give a good first start for a description is to analyse the meta tag on a site. If it is a full sentence and not either a string of keywords or a paragraph, it could form the basis for the first part of a good description (what does the business or organization do/ what is the site about).

The second thing we include in descriptions is a summary of the site contents. This is usually reflected by the site's main mavigation to some extent, although we try to dig out really unique stuff that may be several layers down.

Different areas have different styles of descriptions. To some extent the logic for creating a description is based on the catgory it is in. For example, the Shopping/ branch by definition is for sites that offer products that can be obtained without going to a store. All the sites have to contain products, prices, and method for ordering. Because this is true for all sites, we don't include it in any of the descriptions. Since most sites only include that information, most descriptions only include the products offered. In other categories it may be that the category is so narrow that it isn't necessary to describe what the business does at all, and instead the description only contains site contents.

As far as trusting submissions go, it might be possible to set up a method for creating an "approved submitter" list. We've actually discussed the concept before, however it doesn't work well with our current culture and software. Really what we would be talking about is creating a different kind of editing permission. Right now, an editor with permissions in a category can add, delete, and modify any listing in the reviewed (listed) and unreviewed (waiting) areas. What I think the new permissions could do would be to allow only adding listings to a category. It might be possible to set up some way for people to apply or train for this permission, and then get the permission across a broad area of interest, or maybe for categories at least X levels deep.

Enough rambling. :p I'm not sure any of that made sense, but hopefully it will stir ideas. :D
 

Alucard

Member
Joined
Mar 25, 2002
Messages
5,920
Re: Reorganization and why dmoz human editing fail

Do you have sites that you might trust after a first submission (and factors that make any subsequent submissions more or less trustworthy) ? (It's very germaine to an approach).
Most sites, once submitted and listed once, never ever submit again. If they do, then it's usually not listable. So I don't think that is an approach.

Factors that might apply are 1) if the submitter has correctly supplied the information for 1 or more sites, or 2) the category or 3) URL is more or less trustworthy or 4)anything else. Could you set factors for a category with some certainty?
No, I don't believe that we can, based onmy experience. Right now you'd track a submitter using an email address. Those are wayyy too easy to spoof - when word got out that fred@mysite.ord.uk was a trusted submitter, EVERYONE would submit under that email address. If you're going to have a login for a submitter, well, then they may as well be an editor, right? (We have a type of editor called a Greenbuster who is someone who can essentialy add sites, and review unreviewed that has to be approved by an editor with full privs on the category.)
 

hutcheson

Curlie Meta
Joined
Mar 23, 2002
Messages
19,136
Re: Reorganization and why dmoz human editing fail

Most sites that follow the guidelines, submit once. If they never submit again, then we could have trusted them. This seems pretty straightforward logic, but I don't see how it advances whatever scheme you have in mind.

The few (and I DO mean FEW: less than one in one HUNDRED THOUSAND sites) sites that have been trusted are called "PCP's". IIRC, precisely ONE of them was a single-person effort. A significant minority of the original PCPs are now seen as sites-that-should-not-have-been-trusted. I don't know where you're trying to go with this, but I can pretty well know you're going there without us.

As for "trusted submitters", we call them "editors."
 

spectregunner

Member
Joined
Jan 23, 2003
Messages
8,768
Re: Reorganization and why dmoz human editing fail

Might I suggest that tryong to get to thepoint where and AI can pre-identify a trusted submitter might be the wrong approach.

What might be more useful would be developing a a tunable AI (or series of AIs since different sections of the directory have different guidelines) that could score a submission.

Some of the best value would be in pre-identifying the spam, duplicate submissions and garbage, so that the human editors can be more efficient.

The ability to look at a pool of unreviewed sites, have them arranged according to score has tremendous appeal to me. Especially if the AI got good enough that I know in, say, Shopping, I could quickly kill of anything with a score below n. I'm not sure I'd ever completly trust an AI to add a site in a cat where I edit, but I'd love to rely on one to help me spped through the dross.
 

bobrat

Member
Joined
Apr 15, 2003
Messages
11,061
Re: Reorganization and why dmoz human editing fail

I really like that idea, I end up doing that manually to some extent, e.g. bad descriptions cause me to drop the site back in unreviewed, whereas as pretty good ones, me me want to make some minor changes and list them. If that could be semi-automated that would be nice.
 
This site has been archived and is no longer accepting new content.
Top