Jump to content

Recommended Posts

Posted

Hi,

 

I downloaded http://rdf.dmoz.org/rdf/structure.rdf.u8.gz and http://rdf.dmoz.org/rdf/categories.txt (and other files that contain DMOZ categories), but all foreign characters are replaced by one or two question marks.

 

Here is an example of what I get :

<altlang r:resource="French:Top/World/Fran??ais/Arts/Audiovisuel/Animation"></altlang>

where I expect

<altlang r:resource="French:Top/World/Français/Arts/Audiovisuel/Animation"></altlang>

I inspected the binary content of the file and it really contains hexadecimal 3F where there is a question mark. So I guess this is not a matter of encoding method.

 

This problem does not exist with the sample at http://www.dmoz.org/docs/en/rdf/structure.example.txt .

 

As I am new with ODP data, I could have misunderstood something. Please help me sort this out.

 

Jean-Luc

Posted
Yes, we're already aware of this problem. It's been reported to AOL, but unfortunately we don't yet have any estimated time for a fix to be deployed.
Posted

Thank you for your answer.

 

I noted that the version dated September 26 (the one where I discovered the problem) has been replaced by a version dated October 3, but the international characters are still broken. :(

 

Jean-Luc

Posted

It's been reported to AOL, but unfortunately we don't yet have any estimated time for a fix to be deployed.

International characters are still broken in the release dated October 10. It is hard to understand why a company like AOL lets such a basic problem persist from release to release.

 

Jean-Luc

  • RZ Admin
Posted

International characters are still broken in the release dated October 10. It is hard to understand why a company like AOL lets such a basic problem persist from release to release.

Jean-Luc

For similar reasons that cars get recalled I expect ;)

The latest (15 Oct) RDF is supposedly fixed regarding the character encoding issue. Let us know if you find anything wrong :)

elper {moz}:blue_arrow1::curlie:

All opinions expressed are my own, and do not necessarily represent the official point of view of the administration of either this forum or the directory.

Posted
The latest (15 Oct) RDF is supposedly fixed regarding the character encoding issue. Let us know if you find anything wrong :)
Thank you.

 

The latest content.rdf.u8.gz I see in http://rdf.dmoz.org/rdf/ is dated October 17 and it still contains Fran??ais and M??t??o where I expect Français and Météo. :(

 

Jean-Luc

  • RZ Admin
Posted
Oh ratz :( I'll see that this gets back to staff...

elper {moz}:blue_arrow1::curlie:

All opinions expressed are my own, and do not necessarily represent the official point of view of the administration of either this forum or the directory.

  • RZ Admin
Posted
A new RDF (supposedly free of the utf-8 issue) has been published. (19th October) :)

elper {moz}:blue_arrow1::curlie:

All opinions expressed are my own, and do not necessarily represent the official point of view of the administration of either this forum or the directory.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...