Jump to content

Parsing ODP RDF data as XML


Recommended Posts

Guest slovenac
Posted

I am trying to parse the ODP data to make some use of it. I downloaded all fo it from the "old" page dmoz.org/rdf/. It seems to be bad xml, when I try to parse i get this:

 

<pre><font class="small">code:</font><hr>

Fatal Error at file E:\DMOZ RDF\structure.rdf.u8, line 221251, char 2

Message: Invalid character (Unicode: 0x1)

</pre><hr>

 

It seems the strange language encodings (chinese) making problems.

 

Am i doing something wrong, or is it not me.

 

I would apritiate if anyone expirienced would comment on the way they parse the data, which parsers they use, and how they structure their databases.

 

Dejan

Guest slovenac
Posted

I downloaded new Kids and Teens RDF's.

Can anybody comment on this specific errors :

 

kt-structure.rdf.u8:21975 - No closing " and >, eaten by strange encoding.

<pre><font class="small">code:</font><hr>

<altlang r:resource="Kinesiska (traditionell):Top/Kids_and_Teens/International/Chinese_Traditional/¥ؿ�

 

<altlang r:resource="Nederländska:Top/Kids_and_Teens/International/Nederlands/Zoeken_op_het_Net"/>

</pre><hr>

 

kt-content.rdf.u8:37875 - Strange priority string.

<pre><font class="small">code:</font><hr>

<priority>217.135.36.61 via proxy 195.92.168.177</priority>

</pre><hr>

 

Slovenac

 

[edited to insert line break - apeuro]

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...