I'm attempting to parse the rdf dumps but I'm running into illegal UTF-8 characters. Is anything being done to correct this? What workarounds are possible to get an xml parser to work with this?
I read the Errata and saw the filtering change done on 2003-03-12, but apparently something is not working quite right today because even the link provided in that update reports an overwhelming number of UTF-8 errors.
http://rodan.ncc.com/rdf/
I'm using the Python XML libraries to parse the files and I can't keep it from throwing exceptions. Expat's test utilities throw errors as well.
Could someone who is currently parsing these files share a bit of wisdom?
Thanks!
-Dan
I read the Errata and saw the filtering change done on 2003-03-12, but apparently something is not working quite right today because even the link provided in that update reports an overwhelming number of UTF-8 errors.
http://rodan.ncc.com/rdf/
I'm using the Python XML libraries to parse the files and I can't keep it from throwing exceptions. Expat's test utilities throw errors as well.
Could someone who is currently parsing these files share a bit of wisdom?
Thanks!
-Dan