Downloading Specific Parts of the ODP

Mickey11

Member
Joined
Aug 4, 2004
Messages
2
I am starting up a specialized pay per click search engine, and I'm wondering if there is a way to include the ODP results in my search engine. Since this will be a more specialized search, I don't need to download the entire directory, just certain parts of it. I didn't see any information on this. Can anyone help? Thank you.

M11
 

rahul

Member
Joined
Oct 9, 2004
Messages
20
Hi

I have the same query - I want to download the data from specific cub-categories to add to the articles on my web site, but I can't figure out a way apart from having to download the whole huge dump or manually taking the links and putting them on the pages.

Rahul.
 

bobrat

Member
Joined
Apr 15, 2003
Messages
11,061
Since the RDF is provided as a static file, not a "feed", you would have to download the whole file and write an extract program for the subset you need. It's not that difficult.
 

rahul

Member
Joined
Oct 9, 2004
Messages
20
Thanks! Would you be able to tell me where I can find some pre-written programs for doing the same thing?
 

bobrat

Member
Joined
Apr 15, 2003
Messages
11,061
I don't know of any - I wrote my own. My guess is that each persons need for a subset of ODP data is going to be unique and would require custom written scripts/programs.

In most cases, the RDF data is going to be moved to a database or some other format anyway, so the subset extract would be both getting a subset and reformatting.
 

jgwright

Member
Joined
Sep 1, 2004
Messages
256
bobrat said:
In most cases, the RDF data is going to be moved to a database or some other format anyway, so the subset extract would be both getting a subset and reformatting.
I imagine it would be easier keeping the whole lot and obtain your subset by way of a view or query design.
 

bobrat

Member
Joined
Apr 15, 2003
Messages
11,061
Taking views of several million rows might be taxing unless you have really good computing power. Not IMO the way to handle it. Better to dump what is not needed before loading into tables.
 

rahul

Member
Joined
Oct 9, 2004
Messages
20
Actually I am on a shared server, and don't have a huge disk space on the server, so I have to do it on my computer and upload it. But downloading the whole RDF dump is another problem. I wish we could get smaller feeds.
 

bobrat

Member
Joined
Apr 15, 2003
Messages
11,061
Wait till you start uploading. I believe you will find that a major bottleneck, much greater than that of downloading a full RDF.

As far as getting smaller RDF dumps [they are not feeds] - it's impossible, since everyone who needs a subset will have different requirements for what subset they need. Since they are not feeds but static files we would have to custom generate an RDF file for each person.
 

rahul

Member
Joined
Oct 9, 2004
Messages
20
As far as customised dumps for individuals go, I agree that is not possible at all. But how about separate dumps for the top level categories like Regional, Health, Kids & Teens, etc.?

Not only will it ease the job for some people who use only parts of the ODP, but I guess it would also mean lower bandwidth costs for the ODP.
 

bobrat

Member
Joined
Apr 15, 2003
Messages
11,061
You really should spend time looking at an RDF dump and the documenation before proceeding.

Kids and Teens is available separately.

However, since categories cross reference back and forth to other categories on different top level cats, you would end up with references [for example] from within Art going to Business, should these be left in or not? If you leave them you are creating an incomplete RDF dump, if you leave them in, you are not strictly within Art, and you cause bloat from repetition of information in each dump.

The bottom line, is that in my opinion it will not happen. There is already too much work involved in creating the current RDF dumps, creating one for each section would probably more than double that effort.
 
This site has been archived and is no longer accepting new content.
Top