JasonTimmins Posted February 21, 2009 Posted February 21, 2009 Hi There, I'm not sure if this is the right place but I thought I'd mention it anyway. My import routine for the DMOZ data is blowing-up with an XML scheme failure around line 26946609 of this week's (2008/02/18) content.rdf file. It's to do with an external link to portaljove .com in Top/World/Español/Regional/Europa/España/Comunidades_Autónomas/Comunidad_Valenciana/Educación. The record seems to have two descriptions (one of with is missing it's closing tag) and a second title tag inside one of the descriptions. Anyway, it's a bit of a mess. Can an editor take a look at it? Cheers Jason <URL deleted>
dermotz Posted February 26, 2009 Posted February 26, 2009 The DMOZ rdf dump has always contained errors. You need to change your software to copy with the corrupted data.
Meta hansfn Posted February 27, 2009 Meta Posted February 27, 2009 dermotz, you shouldn't reply to a thread about a topic you clearly don't have any real knowledge about. The current content.rdf is seriously broken with incomplete/broken/cut-off/mixed elements which no software can fix. I have reported the problem in bugs forum internally in DMOZ/ODP. Let's hope that the sysadmin/developers can fix this quickly. PS! I have found one more in World/Norsk/Kunst_og_kultur/Litteratur/Forfattere/Ø/
JasonTimmins Posted February 28, 2009 Author Posted February 28, 2009 Hi There, Thanks for the update. This week's content file is much worse than the previous week's. I found seven errors before I gave up fixing them and admitted defeat. Let's hope the DMOZ people get it together soon. Bye for now Jason. PS. If it helps, I have the seven broken XML chunks on file.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now