JeanLucDmoz Posted September 29, 2010 Posted September 29, 2010 Hi, I downloaded http://rdf.dmoz.org/rdf/structure.rdf.u8.gz and http://rdf.dmoz.org/rdf/categories.txt (and other files that contain DMOZ categories), but all foreign characters are replaced by one or two question marks. Here is an example of what I get : <altlang r:resource="French:Top/World/Fran??ais/Arts/Audiovisuel/Animation"></altlang> where I expect <altlang r:resource="French:Top/World/Français/Arts/Audiovisuel/Animation"></altlang> I inspected the binary content of the file and it really contains hexadecimal 3F where there is a question mark. So I guess this is not a matter of encoding method. This problem does not exist with the sample at http://www.dmoz.org/docs/en/rdf/structure.example.txt . As I am new with ODP data, I could have misunderstood something. Please help me sort this out. Jean-Luc
JeanLucDmoz Posted September 30, 2010 Author Posted September 30, 2010 I have downloaded http://rdf.dmoz.org/rdf/archive/2010-09-02/categories.txt . This archived version does not have the above problem with international characters. So there is a bug in the latest release. Any idea when a solution can be expected ? Thank you. Jean-Luc
chaos127 Posted October 1, 2010 Posted October 1, 2010 Yes, we're already aware of this problem. It's been reported to AOL, but unfortunately we don't yet have any estimated time for a fix to be deployed.
JeanLucDmoz Posted October 6, 2010 Author Posted October 6, 2010 Thank you for your answer. I noted that the version dated September 26 (the one where I discovered the problem) has been replaced by a version dated October 3, but the international characters are still broken. Jean-Luc
JeanLucDmoz Posted October 14, 2010 Author Posted October 14, 2010 It's been reported to AOL, but unfortunately we don't yet have any estimated time for a fix to be deployed. International characters are still broken in the release dated October 10. It is hard to understand why a company like AOL lets such a basic problem persist from release to release. Jean-Luc
RZ Admin Elper Posted October 16, 2010 RZ Admin Posted October 16, 2010 International characters are still broken in the release dated October 10. It is hard to understand why a company like AOL lets such a basic problem persist from release to release. Jean-Luc For similar reasons that cars get recalled I expect The latest (15 Oct) RDF is supposedly fixed regarding the character encoding issue. Let us know if you find anything wrong elper {moz}:curlie: All opinions expressed are my own, and do not necessarily represent the official point of view of the administration of either this forum or the directory.
JeanLucDmoz Posted October 18, 2010 Author Posted October 18, 2010 The latest (15 Oct) RDF is supposedly fixed regarding the character encoding issue. Let us know if you find anything wrong Thank you. The latest content.rdf.u8.gz I see in http://rdf.dmoz.org/rdf/ is dated October 17 and it still contains Fran??ais and M??t??o where I expect Français and Météo. Jean-Luc
RZ Admin Elper Posted October 18, 2010 RZ Admin Posted October 18, 2010 Oh ratz I'll see that this gets back to staff... elper {moz}:curlie: All opinions expressed are my own, and do not necessarily represent the official point of view of the administration of either this forum or the directory.
RZ Admin Elper Posted October 19, 2010 RZ Admin Posted October 19, 2010 A new RDF (supposedly free of the utf-8 issue) has been published. (19th October) elper {moz}:curlie: All opinions expressed are my own, and do not necessarily represent the official point of view of the administration of either this forum or the directory.
JeanLucDmoz Posted October 20, 2010 Author Posted October 20, 2010 That's much better now! Thanks, Elper. Jean-Luc
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now