Jump to content

Linking categories ...


Recommended Posts

Posted

Importing the categories into mysql from the RDF dump, I encountered the following problem :

 

the category

Top/Arts/Movies/Titles/P/Piano_Teacher,_The

has listed as alternate language

Top/World/Italiano/Arte/Cinema/Titoli/P/Pianista,_La

which doesn't exist

 

looking on the dmoz site, at the page

http://dmoz.org/Arts/Movies/Titles/P/Piano_Teacher,_The/

the italian link points to

http://dmoz.org/World/Italiano/Arte/Cinema/Titoli/P/Pianista,_La/

as it should be, but clicking it takes us to

http://dmoz.org/World/Italiano/Arte/Cinema/Titoli/P/La_Pianista/

which also exists in the RDF dump, but i couldn't find any relation between Pianista,_La and La_Pianista in the RDF file.

How does the dmoz site know to forward me to La_Pianista when clicking on the Pianista,_La ?!? Any ideas ?

 

PS. It was just an example, I have at least 2 dozens of similar links ...

 

PS2. I'm looking at the preview of my message, don't click on the links in the message, the forum breaks them.

  • Meta
Posted

That is a problem related to the structure changes we do. Lets explain a bit:

 

If we move a category (or rename, whichis the same to our software), it leaves a redirect from the old place to the new one. Using the ODP software, if you try to visit the old category, it redirects you to the new location. [Edit: So there is no need for the software to change those links ASAP. They are in fact automatically changed when certain actions are done, but there is no way to be certain that no outdated links are in place any more]

 

In theory you could use the downloadable CatMove-Log to see which redirects should be in place. In the real world I know of some problems regarding the catmv-log generation. I don't know if these problems affect the downloadable file as well. See http://rdf.dmoz.org/rdf/ for the file and its date - 12-Nov-2003 looks at least as if it was generated on the last run.

 

If you are trying, it would be nice if you could report back afterwards.

Curlie Meta/kMeta Editor windharp

 

d9aaee9797988d021d7c863cef1d0327.gif

Posted
Strange, but there's no trace of that redirect in either catmv.log or redirect.rdf.u8. By 'that redirect' i am refering to the example i just gave. Any other ideas ? :)
  • Meta
Posted

As I said, there was (maybe still is?) a bug which made the system "forget" to write the catmv log properly. Seems to me, that this affected the downloadable file, too. Sorry to say, but I don't think that we can offer a solution at the moment :-(

 

Maybe someone else knows more, though.

 

Anyway: It sounds like you have got a tool that extracts those bad links from the RDF? If the number is reasonable I would maybe find the time to fix them, if I had the data. (Which would not remove the general problem of course)

Curlie Meta/kMeta Editor windharp

 

d9aaee9797988d021d7c863cef1d0327.gif

Posted

I wrote the tool. However, it doesn't check against the catmv.log file (yet) or the redirect.rdf.u8 (by the way, what is the meaning of this file ... i assumed it was about category aliases, considering the name of the file, but could't find any documentation about what it's supposed to do - and to other files in the rdf directory, too). I'm also using a bit older structure file (the parse/import into mysql took 3.9 days :( )

 

Edit : the older structure is from november the 3rd

Posted

AWB wote: (the parse/import into mysql took 3.9 days )

My goodness, I am currently doing a similar load into a java

object database and I am in the range of 45 minutes.

Cheers

 

Hans

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...