Data for just a few category

Hello,

At first, sorry for my poor english, I'm french...

My Question is just to know if ODP data are available by categories...

Cause, the file for all categories at once is quiet big, and not very easy to work on...(1 Go once decompressed...).

Any Help appreciated,

Pol
 

vladd

Meta/kMeta
Curlie Meta
Joined
Mar 7, 2002
Messages
92
There aren't any official RDF files for a limited category. If you want to import only a small category by RDF, you'll have to download the whole RDF and select the right section.

However, for a small category, you could send a spider (a software program) in order to save the .html pages directly from ODP (or you could save them manually). You could import after that this information in your website.

Or there are scripts dedicated to help you in doing this. There are a lot of methods available. See:

Computers/Internet/Searching/Directories/Volunteer-Edited/DMOZ/Use_of_DMOZ_Data/Upload_Tools/

for details. :smile:
 
Last edited by a moderator:

totalxsive

Member
Joined
Mar 25, 2002
Messages
2,348
Location
Yorkshire, UK
The Digital Windmill service at digitalwindmill . com will let you use live data based on a single category, by adding the following line to your web site:

<pre>&lt;script language="javascript" src="http : // www . digitalwindmill . com/direct/directory.asp?
odp.defaulttop=/Recreation/Aviation/Model_Aviation"&gt;
&lt;/script&gt;</pre>

Replace "/Recreation/Aviation/Model_Aviation" with your category.
 
Last edited by a moderator:

Here are some ideas that might work for you ...

http://www.linkssuite.com/dmoz.htm is a DMOZ extraction package for small categories -- it's not free, but nor is it expensive.

Use one of the free or commercial web-whacking programs, such as those listed in http://dmoz.org/Computers/Software/Shareware/Windows/Internet/Offline_Browsing_Tools (and others in http://dmoz.org/Computers/Software/Internet/Clients/WWW/Browsers ) to download your categories. Then process them using Perl or a combination of grep, sed, awk, etc.

There are several free tools for integrating live ODP data listed in http://dmoz.org/Computers/Internet/Searching/Directories/Open_Directory_Project/Use_of_ODP_Data/Upload_Tools -- they are all somewhat inefficient, but they could also be used to generate static pages that you could update occasionally (much more friendly to the ODP).
 

thanks all for your answers,
I now have to make my decision....

I find amazing (or strange, if you want), the poor choice of FREE (GNU like...) scripts you can find
to work on ODP data.
When I say Free, I mean, with no url, image, to print on your page...

May be it's a bit early...and may be it's up to me to do it...
:/images/dmoz/purplegrin.gif)
 
H

hughprior

I know you have your solution by now, but for those with PHP, here is a little script which you can use to slice a big chunk out of one of the massive dump files. If nothing else, you can then have a peek at the smaller file using any old editor without crashing your PC or waiting a year for the file to open.

If you use it, or get inspired by it, just let me know to make my day <img src="/images/icons/grin.gif" alt="" /> !

<pre><font class="small">code:</font><hr>
/**
* Build file subset
*
* Parameters:
* $in_filename - name of input file
* $out_filename - name of output file
* $from - start line
* $to - end line
*
* Example:
* file_subset("c:/structure.rdf.u8", "c:/subset.txt", 4402000, 4500000);
*
* Author:
* Hugh W Prior www.localpin.com
*/
function file_subset($in_filename, $out_filename, $from=1, $to=0)
{
$in_fp = fopen($in_filename, "r");
$out_fp = fopen($out_filename, "w");

// Ensure the to value is set
if ($to == 0) {
$to = $from + 10000000;
}

// Read in the in file up to the start line
$line = 1;
while ($data = fgets($in_fp, 4096) and $line &lt; $from) {
$line++;
}

// Read in each line from the in file and write to the out file
while ($data = fgets($in_fp, 4096) and $line &lt; $to) {
fwrite($out_fp, $data);
$line++;
}

fclose($in_fp);
fclose($out_fp);
}
</pre><hr>
 
This site has been archived and is no longer accepting new content.
Top