Dmoz data, MySQL PK error...

heroine

Member
Joined
Apr 1, 2007
Messages
14
Hi All,

I have successfully parsed the dmoz dump files into mysql.
There are 4 tables: structure, content_description, content_links and datatypes.

Structure table has fields: catid, name, title.....
I set the primary key as CATID and it works fine.

When I tried to do the same for the rest of the tables such as datatypes (with fields: catid, type, resource) i always got this error:

SQL query: Edit

ALTER TABLE `datatypes` ADD PRIMARY KEY ( `catid` ) ;

MySQL said: Documentation
#1062 - Duplicate entry '1' for key 1

same goes to other tables (content_links with fields: catid,topic,type, resource).


Another problem, the 3rd table (content_description) has the following fields: externalpage, title, description,ages, mediadate,priority.

the question is which one of these fields should be the PK here?

hope to get your feedbacks....
thanks a lot..
 

windharp

Meta/kMeta
Curlie Meta
Joined
Apr 30, 2002
Messages
9,204
A primary key must be unique, so it can't be a field that is identical in different datasets. None of the fields "catid, type, resource" qualifies. I see three solutions:

a) You don't use a primary key. That might have performance issues, but would be the easiest thing to do.
b) You implement an additional field containing a (unique) numerical value. This would be the most straightforward thing in my eyes
c) You could try to generate a unique new field by joining the three fields you already have. While each of them isn'T unqie, a concatenation of all three should be. At least if there are no inconsestencies in the RDF file. But I would prefer any of the previous over this solution.
 

heroine

Member
Joined
Apr 1, 2007
Messages
14
Thanks for the reply...

I tried the a) solution by removing the primary key from structure table....but nope it doesn't work. The mediator to query from Mysql says no primary key. So no export is possible. I tried also to set separately the primary key to other tables but got the same error about duplicate entry.

As for the b) solution, I ain't sure If i understood it all... the additional field containing a unique numerical value should be in all the 4 tables ? and how would it like ? please elaborate more.....

thanks once again for the time...
cheers...!
 

sdang

Member
Joined
Jan 16, 2008
Messages
6
parsing odp rdf

heroine said:
Hi All,

I have successfully parsed the dmoz dump files into mysql.
There are 4 tables: structure, content_description, content_links and datatypes.

Structure table has fields: catid, name, title.....
I set the primary key as CATID and it works fine.

When I tried to do the same for the rest of the tables such as datatypes (with fields: catid, type, resource) i always got this error:

SQL query: Edit

ALTER TABLE `datatypes` ADD PRIMARY KEY ( `catid` ) ;

MySQL said: Documentation
#1062 - Duplicate entry '1' for key 1

same goes to other tables (content_links with fields: catid,topic,type, resource).


Another problem, the 3rd table (content_description) has the following fields: externalpage, title, description,ages, mediadate,priority.

the question is which one of these fields should be the PK here?

hope to get your feedbacks....
thanks a lot..

Hi, would you mind sharing with us how you successfully parsed the data? I am looking at multiple scripts in php and getting a lot of problems. Trying to parse the chefmoz rdf. Thanks
 
This site has been archived and is no longer accepting new content.
Top