Jump to content

Recommended Posts

Posted

Hi

I use a software called "wamp" to parse ODP into mySQL.

I got total 460286 records in "content_links" table.

In this table, all the links map to a category. When I do a query, it would always return one record or none.

 

ex. SELECT * FROM content_links c

WHERE c.resource = 'http://daliarchives.com/';

it would return "Top/Arts/Art_History/Artists/D/Dali,_Salvador"

 

However, this website not only belong to one category as I know. How am I going to retrieve it's related categories?

such as: Arts/Art_History/Periods_and_Movements/Surrealism

and Arts/Movies/Titles/U/Un_Chien_Andalou

 

I don't see any table contain this information(one category maps to related categories)

 

Also, some of the website I queried don't exist in the database but they do show up in ODP. Is it because I didn't parse ODP completely? If this is the case, does anyone have better tool on parsing the full ODP data?

 

Thank you for your patient.

  • 4 weeks later...
  • 2 weeks later...
Posted

As mentioned earlier, I used a software called "wamp" to parse ODP into mySQL.

(the method is in thread: http://www.resource-zone.com/forum/index.php?showtopic=22263&highlight=database)

In the database, I didn't see any related category information even it does exist in structure.rdf.u8.gz RDF dump. So, I guess the parser I used didn't convert all the data into database.

 

Did anyone use other tools that can also obtain related cateogry info after parsing ODP? (without writing my own script)

 

Another issue I encounter is efficiency. Since I need to get category of a webpage, I have to compare url I have with a field in a table/database (string) . It usually takes more than 5 seconds to retrieve category of a specific webpage from database. I'm wondering if anyone did anything to shorten the search time and how the result came out. (ex. use a algorithm to convert url to an unique integer. It would be a lot faster to match integer than string, or change mySQL to other database)

 

Any suggestion?

 

Thanks in advanced.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...