Corruption in content.rdf

JasonTimmins

Member
Joined
Feb 21, 2009
Messages
18
Hi There,

I'm not sure if this is the right place but I thought I'd mention it anyway.

My import routine for the DMOZ data is blowing-up with an XML scheme failure around line 26946609 of this week's (2008/02/18) content.rdf file.

It's to do with an external link to portaljove .com in Top/World/Español/Regional/Europa/España/Comunidades_Autónomas/Comunidad_Valenciana/Educación. The record seems to have two descriptions (one of with is missing it's closing tag) and a second title tag inside one of the descriptions. Anyway, it's a bit of a mess. Can an editor take a look at it?

Cheers
Jason
<URL deleted>
 

dermotz

Member
Joined
Mar 18, 2004
Messages
112
The DMOZ rdf dump has always contained errors.

You need to change your software to copy with the corrupted data.
 

hansfn

Curlie Meta
Joined
Aug 4, 2005
Messages
26
dermotz, you shouldn't reply to a thread about a topic you clearly don't have any real knowledge about. The current content.rdf is seriously broken with incomplete/broken/cut-off/mixed elements which no software can fix. I have reported the problem in bugs forum internally in DMOZ/ODP. Let's hope that the sysadmin/developers can fix this quickly.

PS! I have found one more in World/Norsk/Kunst_og_kultur/Litteratur/Forfattere/Ø/
 

JasonTimmins

Member
Joined
Feb 21, 2009
Messages
18
Hi There,

Thanks for the update. This week's content file is much worse than the previous week's. I found seven errors before I gave up fixing them and admitted defeat.

Let's hope the DMOZ people get it together soon.

Bye for now
Jason.

PS. If it helps, I have the seven broken XML chunks on file.
 
This site has been archived and is no longer accepting new content.
Top