Odd Characters in World Categories from RDF Dump

dataferret

Member
Joined
Jan 7, 2005
Messages
10
Hi All

I have spent the past four weeks grappling with the huge RDF dumps and have managed to parse the ODP data, dump it into a mysql database, convert this into a decent schema and generally clean up the data. I have been working with structure.rdf because it is the smaller file of the two.

Now I have the data in a database, I find there are strange characters in the World sub-categories. These are characters from different languages which mysql does not seem to understand - replacing them with ?* instead. How do I overcome this problem? Does anyone have any thoughts?

Thanks
 

dataferret

Member
Joined
Jan 7, 2005
Messages
10
Thanks for the information and responding to my post. I have looked over the information at the link you suggested - I am not sure what it all means but it does look like it will solve the problem. Now I just need to figure out what the hell they are talking about :D
 

dataferret

Member
Joined
Jan 7, 2005
Messages
10
I have tried to set the database to utf8 and the columns to utf8 too but am still getting problems with invalid characters. Depending on which computer I use the following commands either work or return an error:

# Create the database - use utf8
CREATE DATABASE dmoz CHARACTER SET utf8;

# if database is already created alter to use utf8
ALTER DATABASE dmoz DEFAULT CHARACTER SET utf8;

Can anyone point me in the right direction?
 

lonuncavisto

Member
Joined
Apr 21, 2005
Messages
2
I have the same problem. Tried everything but i cannot get the Russian categories correctly displayed.
Does anyone know the solution ?

Thank you in advance
 
This site has been archived and is no longer accepting new content.
Top