are foreign characters broken in categories ?

JeanLucDmoz

Active Member
Joined
Sep 29, 2010
Messages
31
Hi,

I downloaded http://rdf.dmoz.org/rdf/structure.rdf.u8.gz and http://rdf.dmoz.org/rdf/categories.txt (and other files that contain DMOZ categories), but all foreign characters are replaced by one or two question marks.

Here is an example of what I get :
Code:
<altlang r:resource="French:Top/World/Fran??ais/Arts/Audiovisuel/Animation"></altlang>
where I expect
Code:
<altlang r:resource="French:Top/World/Français/Arts/Audiovisuel/Animation"></altlang>
I inspected the binary content of the file and it really contains hexadecimal 3F where there is a question mark. So I guess this is not a matter of encoding method.

This problem does not exist with the sample at http://www.dmoz.org/docs/en/rdf/structure.example.txt .

As I am new with ODP data, I could have misunderstood something. Please help me sort this out.

Jean-Luc
 

JeanLucDmoz

Active Member
Joined
Sep 29, 2010
Messages
31
I have downloaded http://rdf.dmoz.org/rdf/archive/2010-09-02/categories.txt . This archived version does not have the above problem with international characters.

So there is a bug in the latest release. Any idea when a solution can be expected ?

Thank you.

Jean-Luc
 

chaos127

Curlie Admin
Joined
Nov 13, 2003
Messages
1,344
Yes, we're already aware of this problem. It's been reported to AOL, but unfortunately we don't yet have any estimated time for a fix to be deployed.
 

JeanLucDmoz

Active Member
Joined
Sep 29, 2010
Messages
31
Thank you for your answer.

I noted that the version dated September 26 (the one where I discovered the problem) has been replaced by a version dated October 3, but the international characters are still broken. :(

Jean-Luc
 

JeanLucDmoz

Active Member
Joined
Sep 29, 2010
Messages
31
It's been reported to AOL, but unfortunately we don't yet have any estimated time for a fix to be deployed.
International characters are still broken in the release dated October 10. It is hard to understand why a company like AOL lets such a basic problem persist from release to release.

Jean-Luc
 

Elper

Curlie Admin
RZ Admin
Joined
Sep 15, 2004
Messages
2,899
International characters are still broken in the release dated October 10. It is hard to understand why a company like AOL lets such a basic problem persist from release to release.
Jean-Luc
For similar reasons that cars get recalled I expect ;)
The latest (15 Oct) RDF is supposedly fixed regarding the character encoding issue. Let us know if you find anything wrong :)
 

JeanLucDmoz

Active Member
Joined
Sep 29, 2010
Messages
31
The latest (15 Oct) RDF is supposedly fixed regarding the character encoding issue. Let us know if you find anything wrong :)
Thank you.

The latest content.rdf.u8.gz I see in http://rdf.dmoz.org/rdf/ is dated October 17 and it still contains Fran??ais and M??t??o where I expect Français and Météo. :(

Jean-Luc
 

Elper

Curlie Admin
RZ Admin
Joined
Sep 15, 2004
Messages
2,899
Oh ratz :( I'll see that this gets back to staff...
 

Elper

Curlie Admin
RZ Admin
Joined
Sep 15, 2004
Messages
2,899
A new RDF (supposedly free of the utf-8 issue) has been published. (19[sup]th[/sup] October) :)
 
This site has been archived and is no longer accepting new content.
Top