Data for just a few category

Mar 26, 2002

Hello,

At first, sorry for my poor english, I'm french...

My Question is just to know if ODP data are available by categories...

Cause, the file for all categories at once is quiet big, and not very easy to work on...(1 Go once decompressed...).

Any Help appreciated,

Pol

vladd · Mar 26, 2002

There aren't any official RDF files for a limited category. If you want to import only a small category by RDF, you'll have to download the whole RDF and select the right section.

However, for a small category, you could send a spider (a software program) in order to save the .html pages directly from ODP (or you could save them manually). You could import after that this information in your website.

Or there are scripts dedicated to help you in doing this. There are a lot of methods available. See:

Computers/Internet/Searching/Directories/Volunteer-Edited/DMOZ/Use_of_DMOZ_Data/Upload_Tools/

for details. :smile:

totalxsive · Mar 26, 2002

The Digital Windmill service at digitalwindmill . com will let you use live data based on a single category, by adding the following line to your web site:

<pre><script language="javascript" src="http : // www . digitalwindmill . com/direct/directory.asp?
odp.defaulttop=/Recreation/Aviation/Model_Aviation">
</script></pre>

Replace "/Recreation/Aviation/Model_Aviation" with your category.

Mar 27, 2002

Here are some ideas that might work for you ...

http://www.linkssuite.com/dmoz.htm is a DMOZ extraction package for small categories -- it's not free, but nor is it expensive.

Use one of the free or commercial web-whacking programs, such as those listed in http://dmoz.org/Computers/Software/Shareware/Windows/Internet/Offline_Browsing_Tools (and others in http://dmoz.org/Computers/Software/Internet/Clients/WWW/Browsers ) to download your categories. Then process them using Perl or a combination of grep, sed, awk, etc.

There are several free tools for integrating live ODP data listed in http://dmoz.org/Computers/Internet/Searching/Directories/Open_Directory_Project/Use_of_ODP_Data/Upload_Tools -- they are all somewhat inefficient, but they could also be used to generate static pages that you could update occasionally (much more friendly to the ODP).

Mar 27, 2002

thanks all for your answers,
I now have to make my decision....

I find amazing (or strange, if you want), the poor choice of FREE (GNU like...) scripts you can find
to work on ODP data.
When I say Free, I mean, with no url, image, to print on your page...

May be it's a bit early...and may be it's up to me to do it...
:/images/dmoz/purplegrin.gif)

hughprior · Nov 12, 2002

I know you have your solution by now, but for those with PHP, here is a little script which you can use to slice a big chunk out of one of the massive dump files. If nothing else, you can then have a peek at the smaller file using any old editor without crashing your PC or waiting a year for the file to open.

If you use it, or get inspired by it, just let me know to make my day <img src="/images/icons/grin.gif" alt="" /> !

<pre><font class="small">code:</font><hr>
/**
* Build file subset
*
* Parameters:
* $in_filename - name of input file
* $out_filename - name of output file
* $from - start line
* $to - end line
*
* Example:
* file_subset("c:/structure.rdf.u8", "c:/subset.txt", 4402000, 4500000);
*
* Author:
* Hugh W Prior www.localpin.com
*/
function file_subset($in_filename, $out_filename, $from=1, $to=0)
{
$in_fp = fopen($in_filename, "r");
$out_fp = fopen($out_filename, "w");

// Ensure the to value is set
if ($to == 0) {
$to = $from + 10000000;
}

// Read in the in file up to the start line
$line = 1;
while ($data = fgets($in_fp, 4096) and $line < $from) {
$line++;
}

// Read in each line from the in file and write to the out file
while ($data = fgets($in_fp, 4096) and $line < $to) {
fwrite($out_fp, $data);
$line++;
}

fclose($in_fp);
fclose($out_fp);
}
</pre><hr>

Data for just a few category

vladd

Meta/kMeta

totalxsive

Member

hughprior