logoin Posted January 3, 2006 Posted January 3, 2006 Hi I use a software called "wamp" to parse ODP into mySQL. I got total 460286 records in "content_links" table. In this table, all the links map to a category. When I do a query, it would always return one record or none. ex. SELECT * FROM content_links c WHERE c.resource = 'http://daliarchives.com/'; it would return "Top/Arts/Art_History/Artists/D/Dali,_Salvador" However, this website not only belong to one category as I know. How am I going to retrieve it's related categories? such as: Arts/Art_History/Periods_and_Movements/Surrealism and Arts/Movies/Titles/U/Un_Chien_Andalou I don't see any table contain this information(one category maps to related categories) Also, some of the website I queried don't exist in the database but they do show up in ODP. Is it because I didn't parse ODP completely? If this is the case, does anyone have better tool on parsing the full ODP data? Thank you for your patient.
sfromis Posted January 25, 2006 Posted January 25, 2006 The structure.rdf.u8.gz RDF dump file contains the information you need.
logoin Posted February 2, 2006 Author Posted February 2, 2006 As mentioned earlier, I used a software called "wamp" to parse ODP into mySQL. (the method is in thread: http://www.resource-zone.com/forum/index.php?showtopic=22263&highlight=database) In the database, I didn't see any related category information even it does exist in structure.rdf.u8.gz RDF dump. So, I guess the parser I used didn't convert all the data into database. Did anyone use other tools that can also obtain related cateogry info after parsing ODP? (without writing my own script) Another issue I encounter is efficiency. Since I need to get category of a webpage, I have to compare url I have with a field in a table/database (string) . It usually takes more than 5 seconds to retrieve category of a specific webpage from database. I'm wondering if anyone did anything to shorten the search time and how the result came out. (ex. use a algorithm to convert url to an unique integer. It would be a lot faster to match integer than string, or change mySQL to other database) Any suggestion? Thanks in advanced.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now