Jump to content

Recommended Posts

Posted

I am starting up a specialized pay per click search engine, and I'm wondering if there is a way to include the ODP results in my search engine. Since this will be a more specialized search, I don't need to download the entire directory, just certain parts of it. I didn't see any information on this. Can anyone help? Thank you.

 

M11

  • 2 months later...
Posted

Hi

 

I have the same query - I want to download the data from specific cub-categories to add to the articles on my web site, but I can't figure out a way apart from having to download the whole huge dump or manually taking the links and putting them on the pages.

 

Rahul.

Posted
Since the RDF is provided as a static file, not a "feed", you would have to download the whole file and write an extract program for the subset you need. It's not that difficult.
Posted

I don't know of any - I wrote my own. My guess is that each persons need for a subset of ODP data is going to be unique and would require custom written scripts/programs.

 

In most cases, the RDF data is going to be moved to a database or some other format anyway, so the subset extract would be both getting a subset and reformatting.

Posted
In most cases, the RDF data is going to be moved to a database or some other format anyway, so the subset extract would be both getting a subset and reformatting.
I imagine it would be easier keeping the whole lot and obtain your subset by way of a view or query design.
Posted
Taking views of several million rows might be taxing unless you have really good computing power. Not IMO the way to handle it. Better to dump what is not needed before loading into tables.
Posted
Actually I am on a shared server, and don't have a huge disk space on the server, so I have to do it on my computer and upload it. But downloading the whole RDF dump is another problem. I wish we could get smaller feeds.
Posted

Wait till you start uploading. I believe you will find that a major bottleneck, much greater than that of downloading a full RDF.

 

As far as getting smaller RDF dumps [they are not feeds] - it's impossible, since everyone who needs a subset will have different requirements for what subset they need. Since they are not feeds but static files we would have to custom generate an RDF file for each person.

Posted

As far as customised dumps for individuals go, I agree that is not possible at all. But how about separate dumps for the top level categories like Regional, Health, Kids & Teens, etc.?

 

Not only will it ease the job for some people who use only parts of the ODP, but I guess it would also mean lower bandwidth costs for the ODP.

Posted

You really should spend time looking at an RDF dump and the documenation before proceeding.

 

Kids and Teens is available separately.

 

However, since categories cross reference back and forth to other categories on different top level cats, you would end up with references [for example] from within Art going to Business, should these be left in or not? If you leave them you are creating an incomplete RDF dump, if you leave them in, you are not strictly within Art, and you cause bloat from repetition of information in each dump.

 

The bottom line, is that in my opinion it will not happen. There is already too much work involved in creating the current RDF dumps, creating one for each section would probably more than double that effort.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...