How to get related categories?

logoin

Member
Joined
Nov 28, 2005
Messages
10
Hi
I use a software called "wamp" to parse ODP into mySQL.
I got total 460286 records in "content_links" table.
In this table, all the links map to a category. When I do a query, it would always return one record or none.

ex. SELECT * FROM content_links c
WHERE c.resource = 'http://daliarchives.com/';
it would return "Top/Arts/Art_History/Artists/D/Dali,_Salvador"

However, this website not only belong to one category as I know. How am I going to retrieve it's related categories?
such as: Arts/Art_History/Periods_and_Movements/Surrealism
and Arts/Movies/Titles/U/Un_Chien_Andalou

I don't see any table contain this information(one category maps to related categories)

Also, some of the website I queried don't exist in the database but they do show up in ODP. Is it because I didn't parse ODP completely? If this is the case, does anyone have better tool on parsing the full ODP data?

Thank you for your patient.
 

sfromis

Member
Joined
Mar 25, 2002
Messages
202
The structure.rdf.u8.gz RDF dump file contains the information you need.
 

logoin

Member
Joined
Nov 28, 2005
Messages
10
As mentioned earlier, I used a software called "wamp" to parse ODP into mySQL.
(the method is in thread: http://www.resource-zone.com/forum/index.php?showtopic=22263&highlight=database)
In the database, I didn't see any related category information even it does exist in structure.rdf.u8.gz RDF dump. So, I guess the parser I used didn't convert all the data into database.

Did anyone use other tools that can also obtain related cateogry info after parsing ODP? (without writing my own script)

Another issue I encounter is efficiency. Since I need to get category of a webpage, I have to compare url I have with a field in a table/database (string) . It usually takes more than 5 seconds to retrieve category of a specific webpage from database. I'm wondering if anyone did anything to shorten the search time and how the result came out. (ex. use a algorithm to convert url to an unique integer. It would be a lot faster to match integer than string, or change mySQL to other database)

Any suggestion?

Thanks in advanced.
 
This site has been archived and is no longer accepting new content.
Top