Categories structure and URLs stability

What is the ODP policy concerning categories structure - and therefore URLs - stability?
Since ODP taxonomy is the largest freely available on the Web, one could consider using the categories' URLs as "Published Subject Indicators" (PSI) as defined in Topic Maps specifications.
See more about that PSI question at:
www.oasis-open.org/committees/tm-pubsubj/

This is BTW an OASIS Technical Committee I'm chairing, and I was wondering (also as a former editor) if there is any awareness/interest at all in the ODP community in that matter?
 

hutcheson

Curlie Meta
Joined
Mar 23, 2002
Messages
19,136
How static does a PSI have to be? Obviously, things like the Dewey Decimal System and Library of Congress topic lists are always in a state of flux -- they print new editions every few years, and pile up thousands of changes in between times. My suspicion is that the ODP is more fluid, but surely, even so, you (anyone) could periodically grab a copy of the RDF and anoint it "version xxx.xxx"
 

Thanks hutcheson for your first reply. More precisions.

Yes, a PSI needs to be as stable as can be - whatever that means in the Web universe - at least till next week ...
That means the *URI* has to be stable, because it will be used by topic maps and/or any kind of semantic agents as a subject identifier. The resource the URI resolves to (the subject indicator) can have some "fluidity" - as you call it - in its content, as conception of the subject evolves. But unfortunately, very often in the Web universe, resources (and subjects they describe) are more stable than their URIs. Subjects can be moved in the taxonomy without really losing their identity, if URIs include the taxonomy in their very syntax.

If e.g. category http://dmoz.org/Science/Astronomy/Solar_System/Sun
is moved to http://dmoz.org/Science/Astronomy/Stars/Sun

in fact it does not change the subject of the category - Sun is the only star in the Solar System, right? - that means the indexed resources will be the same, but the URI has changed, so the PSI is somehow broken. In fact we will have now two PSIs for the same subject, in different versions ...

Please note there is no standard nor even sustainable ready-made solution for that at the moment. It's a core issue we're discussing it in the TC, and that's why we are sort of auditing managers of existing taxonomies to see how they feel about it, and would be very happy to see ODP community jump into the debate.

My personal view is that categories of which subject seems well-identified and declared stable (say at meta-editor level) could be attached e.g. simplified PURL that will redirect to the actual category. For example the identifier of the above quoted category could be something like http://purl.org/dmoz/astronomy/sun

Concerning other classifications. As you point out, e.g. DDC and LoC are also evolving, but they care about versioning themselves, they don't rely on anyone else for publishing versions. OTOH so far they have no addressable identifiers (URIs), although some people at LoC at least are aware of the issue and are thinking about publishing some subjects with URI identifiers.
 

>>If e.g. category http://dmoz.org/Science/Astronomy/Solar_System/Sun
is moved to http://dmoz.org/Science/Astronomy/Stars/Sun

In this case the system sets an automatic redirect from the original (old) URL to the current (new) URL. That's why we have an internal function called "catmv" that only a selected group of more experienced editors can perform.

You can try it clicking on http://dmoz.org/World/Italiano/Regionale/Europa/Italia/Geografia/ which has ben recently renamed to http://dmoz.org/World/Italiano/Regionale/Europa/Italia/Mappe_e_Vedute . You will be automatically redirected to the current (new) category.

I guess this partially answers your question ?
 

Hello Ettore

The redirecting feature is great. I did not know about it, although I've been an editor for a while, but that was 2 years ago. Either I forgot, either it was not available at the time.

More questions to dive further deep in it:

1. Does it work through more than one change: that is, if the category moves again, are all the old URLs still redirecting to the current one?

What about categories simply killed or dismembered? Do their URLs go to the Dead Cats 404 Graveyard? /images/dmoz/purplegrin.gif
Or do they resolve to some page saying what happened to the inhabitants of the dead cat, and at what point in the time?
 

yklaw

Curlie Meta
Joined
Feb 28, 2002
Messages
186
>>1. Does it work through more than one change: that is, if the category moves again, are all the old URLs still redirecting to the current one?

In theory, yes, if the catmv mechanism is used.

>>What about categories simply killed or dismembered? Do their URLs go to the Dead Cats 404 Graveyard?
Goes 404. However, we try our best not to delete categories, and there is a link on the 404 page to the parent category, where links to the new home of the sites are likely to be found.
 

hello pubsubj,

>> 1. Does it work through more than one change

Yes, it does.

>> What about categories simply killed or dismembered?

Deleting categories is something that happens far down the hierarchy, and not very often. When two categories are merged, or a category is disbanded with sites moved to different cats, we always try to perform what is called a "circular move". That is, the "killed" category ends up renamed with the name of one of the existing target categories, thus leaving the redirect.

We can therefore say that most of ODP hierarchy is somewhat "stable" (in its original or modified-but-redirected version), and of course the higher is the level the less categories are deleted without redirecting to a new one. Exceptions may occur in newly created subtrees: e.g., new World/[Language] categories where the structure is still under constant reorg and improvement.

You can also download the catmove history and take a look at what happened, or track category names in their evolution, as it is freely available as part of the RDF dump: in this page you will find the link to the file catmv.log.gz (8.9 Mb - enjoy /images/icons/smile.gif )

>>Or do they resolve to some page saying what happened to the inhabitants of the dead cat, and at what point in the time?

No. There's no way to retrieve information about a category which has been "killed" without redirect. Aside from the info provided in the catmv.log file, only editors have a few more tools available for tracking moves/deletes. BTW, a category cannot be deleted if it's not empty.

<added> yklaw types faster /images/icons/wink.gif </added>
 

Thanks ettore and yklaw. That helps much.

If I try to sum up going back to the original issue:

Stability of categories-subjects can be considered a general ODP guideline.
The higher/older the cats, the more likely to be stable.

So coming back to my previous suggestion. Would it be silly to imagine something like a PSI "stamp" on categories deserving it, at least in the more stable part of the directory, meaning that

1. This category is here to stay, with the same subject.
2. If ever it is moved in the classification scheme - which is anyway very unlikely in a foreseeable future - the present URL will still resolve to the same subject-category, even if it is classified differently.
3. The current URL will keep resolving to a resource containing enough content to figure what it is about.

Such a PSI stamp would allow trustable use of those URLs as something very close to what PSIs are supposed to be indeed.
What do you think, both of the idea in itself, and of the opportunity/possibility to push such a proposal into ODP process?
 
K

kujanomiko

>The higher/older the cats, the more likely to be stable.

Not true. Most of the older editors will remembers the Arts/Anime move to Arts/Animation/Anime a few years ago (I'm not sure when, it was before I was an editor). So even higher up cats can move. I think in this case, the move was so long ago, that the redirecting thing probably wasn't implemented. But then again, if it was so long ago...

It's scary when you make a post and you're not sure what the point of it is...;)
 

hutcheson

Curlie Meta
Joined
Mar 23, 2002
Messages
19,136
Well, ODP certainly does its own versioning -- every week, barring accidents. As for a fixed URL (or, I suppose, URI in this case) -- no, we really don't have that: some categories do simply vanish (each week several categories do (but this is out of 300,000+ categories). Is that stable enough? My LOC topic list includes "deprecated" topic names, that shouldn't be used for future books -- but they map them into the alphanumeric codes (AA9999 or whatever) that, I presume(?), don't change. ODP also has topic tokens (integers), which MAY be really-and-truly fixed.
 

>> the Arts/Anime move to Arts/Animation/Anime
>> the move was so long ago, that the redirecting thing probably wasn't implemented.

It was /images/icons/smile.gif Try http://dmoz.org/Arts/Anime/ and you'll end up to Arts/Animation/Anime
 

All that said for stability, and I'm glad to hear it is a long-ago concern ... but my question now is to know if there is interest enough "up there" in ODP for a declaration of stability for well-established categories, or is it to the external users to figure and bet: "this one has been there for 3 years now, guess it will not move any more, and I'll nail it as an identifier".
What we figure in our TC reflection, is that something has rather to be nailed down and declared from publisher's side for PSIs to be effective.

Now this is more an ODP process issue than a technical one, and maybe this forum is not the place to discuss it. I don't figure exactly. Anyway if any meta-editor wants to push the question further on, I'm also open to private debate, and OASIS TC forum is also open.

http://lists.oasis-open.org/archives/tm-pubsubj-comment/
 

lissa

Member
Joined
Mar 25, 2002
Messages
918
I don't think the ODP will ever draw a line in the sand and say "This is fixed, and it's not going to change", with the possible exception of the 1st tier categories. Some things that appear to be stable now may just be underdeveloped or unnoticed. New fields or philosophies about existing ones may influence category structure. A simple enhancement in software (e.g. tagging sites with geographical data) could change the need to organize in certain ways and allow flexibility in presenting the data.

I think editors would love to have the perfect directory structure defined and "fixed", but I don't think we can define that until we've cataloged everything in existence. And I think that's gonna take a while. /images/icons/wink.gif
 

lissa
> I think editors would love to have the perfect directory structure defined and "fixed" ...

I don't think they should. Structure can change, that's not the point, but *subjects* of categories can have a kind of stability. You are editing

Science: Institutions: Zoos and Aquariums

I suppose you consider this subject to have a persistent definition, like "institutions where animals are displayed for entertrainment and education". This subject and its definition are completely independent of the directory structure.
If you have redirection, as explained by ettore, you can say that the current URL of that cat will indicate this subject for a while, no? That is the kind of persistence PSI are about.

BTW ... I found your editing category in your editor profile http://dmoz.org/profiles/lissa.html
Those are very stable subject indicators, even when the editor is no more an editor for almost two years, e.g.

http://dmoz.org/profiles/universimmedia.html

My last has-been editor profile is there, cast in stone for eternity ... I can't change it any more, and it seems that nobody cares to cancel it ... but that's another thread maybe /images/icons/smile.gif
 

Looks like this thread is dying out ... no more comments on the subject?
Or is it I'd better not have mentioned an old ghost in my previous post? Think I'll open this new thread about persistence of profiles of has-been editors after all ... /images/dmoz/purplegrin.gif
 

Hy,
Generaly it's a very good question.
I think it will not have any judicious answer there.
Have a look at the dmoz taxonomy (or taxonomie in french) who's define in the "great dictionnary" as a big structure who organise... Every things you want... For exemple have a look at the "dmoz taxonomy" in a lots of directories.
Hope you'll find because it easier.
For example, in france or in italy, you can see that's all the sites are duplicates in the country and in the regional country. That's should be the same every where if the "taxonomy" works. Understand, the same site do it better two times (than editors or humans).. Or more.. If taxonomy would like.
Have a good will.
Yann
 

hutcheson

Curlie Meta
Joined
Mar 23, 2002
Messages
19,136
No, the profile is not a sticking point. The official answer is that the profiles contain "anyone who contributed to the Open Directory", whether or not currently active. (reinstatement happens sometimes!)

The issue is, I think, that the ODP is not yet willing to consider any but the largest few hundred categories (and some Regional "template-based" categories) fixed. And in particular, sites' location in categories is certainly not fixed -- as a category grows, it is subdivided, and sites moved into the subcategories.

The weekly RDF dumps do provide versionization; but I don't think "moved categories" are tracked there. So if you want more, ODP staff (or perhaps editor-hackers) will have to be involved. At this point, there doesn't seem to be an interest in publishing the "category-move tracking" that you would need.
 

hutcheson

> The issue is, I think, that the ODP is not yet willing to consider any but the largest few hundred categories (and some Regional "template-based" categories) fixed.

Well. Knowing those few hundred would be better than nothing. I know it's very difficult to "draw a line in the sand" as lissa said. I'm involved in standards organization, and I know what it means to say "this is fixed". All you do afterward has to be backward-compatible, so you've better be careful.

But a good point is to think about it. What I figure, out of working on Knowledge Organisation for big companies, it that there is always a core of very stable categories, surrounded by a cloud of more moving ones. Defining this core is a costly process, but is worth it.

> And in particular, sites' location in categories is certainly not fixed -- as a category grows, it is subdivided, and sites moved into the subcategories.

Yes, of course. I was certainly speaking about stability of classes (categories), not of attachment of instances (sites).
 
This site has been archived and is no longer accepting new content.
Top