A three-dimensional classification

V

vicentelles

I think it is very difficult to work with the “world” directory, mixing three-dimensional categories (language, subject and region) in a one-dimensional classification.

It is very hard to collect a list of sites from a country and a category. They are dispersed in a lot of directories of that category inside different regions, provinces and villages. I will explain it wit an example.

I was looking for the best auto web pages in Spain. I looked for the “Economía y negocios > compras” category. I could find this category at 5 levels of depth:

1.- World > Español > Compras > Vehículos

2.- World > Español > Países > España > Economía y negocios > Compras > Vehículos

3.- World > Español > Países > España > Comunidades Autónomas > Comunidad Valenciana > Economía y negocios > Compras

4.- World > Español > Países > España > Comunidades Autónomas > Comunidad Valenciana > Valencia > Economía y negocios > Compras

5.- World > Español > Países > España > Comunidades Autónomas > Comunidad Valenciana > Valencia > Valencia > Economía y negocios

There are 17 territories in Spain, and we can suppose an average of 4 provinces in each territory and, perhaps, 10 villages classified in each province. This makes 17*4*10 = 680 regional directories in Spain (more or less)

Can you imagine the task of collecting the sites of one specific subject spread between 680 territorial directories? (Of course I have not done it).

The enormous effort done with the classification of millions of web pages could be a lot more useful (without any extra manual effort) with a 3D classification: Subject, Lenguage and Territory. In the USA only the fist dimension is needed, but when you extend it to the whole world you can’t use the language or territory just like another subject: it doesn’t works.

Saying a 3D classification I mean the following:
1.- Subject: Economía y negocios > compras
2.- Territory: España > Comunidad Valenciana > Valencia > Valencia
3.- Language: Español

This change will need a big technical effort but all the manual work already done could be automatically converted. And the benefit for the non-USA users would be high. Indeed, the USA users could benefit from a territorial classification.


Sorry for my poor English and the Spanish examples
 

xixtas01

Member
Joined
Jun 16, 2003
Messages
624
This is an excellent idea. When the next generation dmoz organization structure is contemplated and decided upon, I'd be willing to bet it will be much more 3 dimensional than it is now.

I think that we editors are generally aware that frequently sites and whole categories are appropriate to various areas of the directory and attempt to mitigate this problem by judicious use of @links and related categories links.

Perhaps improving our search algorythm or building an advanced search applet (one that was influenced by information in categories such as category paths, @ links, related categories, and descriptions of other listings) might make for better relevancy in results, and help to satisfy the needs of web searchers looking for results scattered throughout categories.

I believe that improvements in the search are a lot more likely to happen in the next couple years because it wouldn't require a full-on restructuring of the directory as your suggestion would.

Still I think this idea could be a seed for a really great idea. Allowing users to navigate the data using various organizational hierarchies would be great!
 

lissa

Member
Joined
Mar 25, 2002
Messages
918
When the next generation dmoz organization structure is contemplated

Actually, it is my understanding that something along this line has been in contemplation for more than a year. However, the significant technical problems we had this past year put that effort on hold. Hopefully staff will be able to continue work in this area soon.

The method used currently is to double list sites - once in "topical" and once in "Regional", if they have a dual nature. So for example, if you wanted to find all of the Episcopal Churches in California, they're grouped together by topic. Or if you wanted to find all the Churches in San Diego, they're grouped by region.
 
V

vicentelles

I am glad to see that the regional question is taken in consideration.

But don’t forget the third dimension: the language. One example: there is a classification for the English webs from Spain in the “Regional” structure and another different for the Spanish webs from Spain in the “Word” structure. One web can be multilingual and associated to several languages and only to one region and subject. It is difficult to solve this problem in a 2D classification.


A 3D classification could facilitate the development of powerful searching tools.

I see another problem with the tree classification. As there are webs in every node, it is difficult to collect together all the webs of any non-terminal subject because they are distributed in that node and also in all of the descendants. This difficulty is associated to the subject tree and also to the Regional tree.

Do anybody know any existing Site Using ODP Data where is possible, when searching, to get (if desired) the list of every web included in the node and all the descendants?
 

lachenm

Meta/kMeta
Curlie Admin
Joined
Aug 2, 2002
Messages
1,610
The language question is also taken into consideration, albeit in a less-elegant way than you have proposed.

Just as, in the main (English-language) section, a site may be listed in Topical and Regional branch categories, it may also gain a separate listing for each language it has, in the appropriate World/ branch category. As I understand the guidelines (though I don't edit much in World/ , since I'm only fluent in English), in each World/ branch language-specific category, it could also get listings in Topical and Regional branches as appropriate.

Considering Spanish, for example, a Spanish-language site about a store with locations in Spain, and shipping worldwide, could get a listing in the appropriate Regional level under http://dmoz.org/World/Espa%f1ol/Pa%edses/Espa%f1a/ in an Econom%eda_y_negocios/Compras/ subcategory, and a another listing in an appropriate topical category in http://dmoz.org/World/Espa%f1ol/Compras/ . If it had English content as well, it could also be listed in an appropriate topical category in http://dmoz.org/Shopping/ and in http://dmoz.org/Regional/Europe/Spain/ in a Business_and_Economy/Shopping subcategory at an appropriate regional level.

Not technically pretty, perhaps, and it requires extra editor (and submitter) effort, but the major questions of classification have been addressed at some level.
 

lisahinely

Member
Joined
Jul 30, 2003
Messages
246
... and the corresponding categories in different languages are linked directly to each other. (Or should be, we're typing as fast as we can... :) )
 

lissa

Member
Joined
Mar 25, 2002
Messages
918
there is a classification for the English webs from Spain in the Regional structure and another different for the Spanish webs from Spain in the World structure.

You may be misconstruing something about the directory structure. As long as a site is in English, it may be listed in both an English topical category and an English Regional category (assuming it meets listing criteria) regardless of where it is from. So an English-language site from Spain is perfectly qualified to be listed in topical, and an English-language site from the US or UK is qualified to be listed under Regional/Europe/Spain.

Generally, each of the World languages follows the English language pattern of topical and regional, with two differences.

- First, the categories will be developed to the extent that there are sites for them. So if a topic in English has 1000 sites, but the same topic in Spanish only has 50, there are likely to be a lot more subcats developed in the English category.

- Second, there will be some differences in the structure simply due to the way different concepts are handled in different languages. We try to keep the categories in different languages as parallel as possible so that they can be linked between using the altlang links, but sometimes this just can't be done.

:cool:
 
This site has been archived and is no longer accepting new content.
Top