Categorizing the DMOZ categories

John Nagle

Member
Joined
Apr 12, 2007
Messages
6
I'm trying to determine whether a given DMOZ category has certain properties.
For example, does presence in that category indicate that a site is commercial? Obviously, anything under "Shopping" or "Business" can be considered commercial, but what about the categories under "Computers/Internet" or "Recreation/Travel"? Is there any data useful in categorizing the categories?

Another categorization is single-authorship sites vs. aggregation/hosting sites. Is there any way to identify hosting services, blogs, social-networking sites, forum systems, and other sites where the content is user-provided? Such sites appear in dozens of different DMOZ categories, and there's no easy way to find them.
 

spectregunner

Member
Joined
Jan 23, 2003
Messages
8,768
I'll take a first pass at this, but I am sure someone smarter than I will come along and straighten out what I mess up.

We categorize website by either content, language or geography. Within language we categories by content or geography.

Every category has posting guidelines that describe what we are looking for in terms of content for that specific category (or for the category that is one level higher in the directory tree.)

We don't differentiate between commercial and non-commercial, and even some categories that one would believe to be all commercial can contain non-commercial or governmental sites if the content matches. With 700,000 categories and !4.8 million listings we have learned very well that there are few hard and fast rules. We constantly tell newer editors and experienced editors alike that they have to be guided by the content. This is precisely why we have editing guidelines and not editing rules.

While it is invisible to the general pubic, our internal fora are filled with questions along the lines of: what would you do with this site? Some of the discussions get "interesting."

I'm not sure that there is any automated way to get what you are looking for, but you can probably begin to get a good idea of the content by reading all of the category guidelines for each of the major ODP categories and one or two levels of subcategories.

What is probably making your wishes more difficult is what makes being an editor more interesting. We don't just file and sort and process suggestions, we look at the sites and try to determine the best place for them, based on their content, the guidelines and our best judgement.

Good luck.
 

sfromis

Member
Joined
Mar 25, 2002
Messages
202
Within each language, the core categorization criteria is by topic. When many sites are listed for a certain topic, they many be subdivided by some of the "site type" categorizations you're looking for. Thus, a subcategory may be created (within a topic) for e.g. Chats_and_Forums, but not until the amount of content warrants it. While the vast majority of sites in business/shopping are indeed commercial, some types of topic-relevant informational sites may also be listed in some subcategories.

Some leaf category names do imply a categorization of the types you're looking for, but you cannot reason that sites in other categories are not of these types.
 

chaos127

Curlie Admin
Joined
Nov 13, 2003
Messages
1,344
It may be of interest to know that attempts are made to standardise leaf-category names is two specific situations:

1/ When dividing a topic by the type of site. See: Preferred Terms

2/ When breaking down categories in Regional into topical (i.e. non-geographic) categories. See: Subject Subcategories Template

However, adherence to these guidlines is not perfect, trough both intentional and accidental means. (Also, in case 1, you may find some of those names used for categories on actual topic that the name implies, rather than as a sub-categorisation of a higher topic. For example: Computers/Internet/On_the_Web/Weblogs/ is about Weblogs, rather than being for weblogs about "On the Web".)

Personally, I would like to see better meta-data added to categories to provide details of what sort of categorisation they represent. That may or may not be part of any future direction taken by the ODP...
 
This site has been archived and is no longer accepting new content.
Top