Guest Awb Posted November 18, 2003 Posted November 18, 2003 Importing the categories into mysql from the RDF dump, I encountered the following problem : the category Top/Arts/Movies/Titles/P/Piano_Teacher,_The has listed as alternate language Top/World/Italiano/Arte/Cinema/Titoli/P/Pianista,_La which doesn't exist looking on the dmoz site, at the page http://dmoz.org/Arts/Movies/Titles/P/Piano_Teacher,_The/ the italian link points to http://dmoz.org/World/Italiano/Arte/Cinema/Titoli/P/Pianista,_La/ as it should be, but clicking it takes us to http://dmoz.org/World/Italiano/Arte/Cinema/Titoli/P/La_Pianista/ which also exists in the RDF dump, but i couldn't find any relation between Pianista,_La and La_Pianista in the RDF file. How does the dmoz site know to forward me to La_Pianista when clicking on the Pianista,_La ?!? Any ideas ? PS. It was just an example, I have at least 2 dozens of similar links ... PS2. I'm looking at the preview of my message, don't click on the links in the message, the forum breaks them.
Meta windharp Posted November 18, 2003 Meta Posted November 18, 2003 That is a problem related to the structure changes we do. Lets explain a bit: If we move a category (or rename, whichis the same to our software), it leaves a redirect from the old place to the new one. Using the ODP software, if you try to visit the old category, it redirects you to the new location. [Edit: So there is no need for the software to change those links ASAP. They are in fact automatically changed when certain actions are done, but there is no way to be certain that no outdated links are in place any more] In theory you could use the downloadable CatMove-Log to see which redirects should be in place. In the real world I know of some problems regarding the catmv-log generation. I don't know if these problems affect the downloadable file as well. See http://rdf.dmoz.org/rdf/ for the file and its date - 12-Nov-2003 looks at least as if it was generated on the last run. If you are trying, it would be nice if you could report back afterwards. Curlie Meta/kMeta Editor windharp
Guest Awb Posted November 18, 2003 Posted November 18, 2003 Strange, but there's no trace of that redirect in either catmv.log or redirect.rdf.u8. By 'that redirect' i am refering to the example i just gave. Any other ideas ?
Meta windharp Posted November 18, 2003 Meta Posted November 18, 2003 As I said, there was (maybe still is?) a bug which made the system "forget" to write the catmv log properly. Seems to me, that this affected the downloadable file, too. Sorry to say, but I don't think that we can offer a solution at the moment :-( Maybe someone else knows more, though. Anyway: It sounds like you have got a tool that extracts those bad links from the RDF? If the number is reasonable I would maybe find the time to fix them, if I had the data. (Which would not remove the general problem of course) Curlie Meta/kMeta Editor windharp
Guest Awb Posted November 18, 2003 Posted November 18, 2003 I wrote the tool. However, it doesn't check against the catmv.log file (yet) or the redirect.rdf.u8 (by the way, what is the meaning of this file ... i assumed it was about category aliases, considering the name of the file, but could't find any documentation about what it's supposed to do - and to other files in the rdf directory, too). I'm also using a bit older structure file (the parse/import into mysql took 3.9 days ) Edit : the older structure is from november the 3rd
sfromis Posted November 21, 2003 Posted November 21, 2003 The starting point for ODP RDF dump, including documentation is here: http://rdf.dmoz.org/ One of the pages briefly describe the redirect file: http://rdf.dmoz.org/rdf/Changes.html (See entry for date 1999-08-24).
hmf Posted November 25, 2003 Posted November 25, 2003 AWB wote: (the parse/import into mysql took 3.9 days ) My goodness, I am currently doing a similar load into a java object database and I am in the range of 45 minutes. Cheers Hans
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now