userlite Posted March 11, 2008 Posted March 11, 2008 Hi, I would like to use DMOZ data - content and structure files to find out the best possible category for a term. For example: when I enter "bmw" in DMOZ online search, the first result is: # Recreation: Autos: Makes and Models: BMW , which is most relevant for term "bmw" - as compared to others such as: # Recreation: Motorcycles: Makes and Models: BMW (95) # World: Deutsch: Freizeit: Auto: Marken: BMW (72) # Business: Automotive: Motorcycles: Makes and Models: Retailers: BMW (31) # Home: Consumer Information: Automobiles: Purchasing: By Make: BMW (11) How can I achieve this using the rdf files? Out of all the possible categories, how do I pick the right one automatically? Any help is appreciated. Thanks.
chaos127 Posted March 11, 2008 Posted March 11, 2008 I guess you would need to design a search algorithm that looks at the contents of each category (category name, path, titles of listed sites, descriptions of listed sites) and gives is a score based on occurrences of the search term(s) entered. Then you just return the category with the highest score. That's pretty much what the search function on http://www.dmoz.org/ does to come up with the category choices.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now