Mickey11 Posted August 4, 2004 Posted August 4, 2004 I am starting up a specialized pay per click search engine, and I'm wondering if there is a way to include the ODP results in my search engine. Since this will be a more specialized search, I don't need to download the entire directory, just certain parts of it. I didn't see any information on this. Can anyone help? Thank you. M11
rahul Posted October 10, 2004 Posted October 10, 2004 Hi I have the same query - I want to download the data from specific cub-categories to add to the articles on my web site, but I can't figure out a way apart from having to download the whole huge dump or manually taking the links and putting them on the pages. Rahul.
bobrat Posted October 10, 2004 Posted October 10, 2004 Since the RDF is provided as a static file, not a "feed", you would have to download the whole file and write an extract program for the subset you need. It's not that difficult.
rahul Posted October 10, 2004 Posted October 10, 2004 Thanks! Would you be able to tell me where I can find some pre-written programs for doing the same thing?
bobrat Posted October 10, 2004 Posted October 10, 2004 I don't know of any - I wrote my own. My guess is that each persons need for a subset of ODP data is going to be unique and would require custom written scripts/programs. In most cases, the RDF data is going to be moved to a database or some other format anyway, so the subset extract would be both getting a subset and reformatting.
jgwright Posted October 10, 2004 Posted October 10, 2004 In most cases, the RDF data is going to be moved to a database or some other format anyway, so the subset extract would be both getting a subset and reformatting.I imagine it would be easier keeping the whole lot and obtain your subset by way of a view or query design.
bobrat Posted October 10, 2004 Posted October 10, 2004 Taking views of several million rows might be taxing unless you have really good computing power. Not IMO the way to handle it. Better to dump what is not needed before loading into tables.
rahul Posted October 10, 2004 Posted October 10, 2004 Actually I am on a shared server, and don't have a huge disk space on the server, so I have to do it on my computer and upload it. But downloading the whole RDF dump is another problem. I wish we could get smaller feeds.
bobrat Posted October 10, 2004 Posted October 10, 2004 Wait till you start uploading. I believe you will find that a major bottleneck, much greater than that of downloading a full RDF. As far as getting smaller RDF dumps [they are not feeds] - it's impossible, since everyone who needs a subset will have different requirements for what subset they need. Since they are not feeds but static files we would have to custom generate an RDF file for each person.
rahul Posted October 11, 2004 Posted October 11, 2004 As far as customised dumps for individuals go, I agree that is not possible at all. But how about separate dumps for the top level categories like Regional, Health, Kids & Teens, etc.? Not only will it ease the job for some people who use only parts of the ODP, but I guess it would also mean lower bandwidth costs for the ODP.
bobrat Posted October 11, 2004 Posted October 11, 2004 You really should spend time looking at an RDF dump and the documenation before proceeding. Kids and Teens is available separately. However, since categories cross reference back and forth to other categories on different top level cats, you would end up with references [for example] from within Art going to Business, should these be left in or not? If you leave them you are creating an incomplete RDF dump, if you leave them in, you are not strictly within Art, and you cause bloat from repetition of information in each dump. The bottom line, is that in my opinion it will not happen. There is already too much work involved in creating the current RDF dumps, creating one for each section would probably more than double that effort.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now