Extracting content.rdf.u8

kunwarbs · Mar 5, 2005

I have downloaded content.rdf.u8 file and unzipped it to get content.rdf.u8. Now how do i get content.rdf file?? i cant see any option with winzip to extrat it to get .rdf file..

pls help

addy · Mar 5, 2005

You don't have to extract content.rdf from content.rdf.u8 as .u8 extension is just to say that it's an RDF file encoded in UTF-8.
RDF stands for Resource Description Framework and it's built on URI and XML technologies.
IMHO it's not so obvious how to use rdf data even because content.rdf.u8 is quite big, almost 2 GB, so you should write a parser to extract DMOZ data from it.
If you want to see what there is in the file you could do something like:
type content.rdf.u8 | more
from command line (I guess you are using Windows as you mentioned winzip).

If you don't want to write your own tool to extract data, there are many tools, both free and commercial, to extract data from this file.
Anyway to parse content.rdf.u8 in a useful way you need structure.rdf.u8 too.
You can check this page from Dmoz.org about such tools:
http://dmoz.org/Computers/Internet/Searching/Directories/Open_Directory_Project/Use_of_ODP_Data/Upload_Tools/

A new tool not listed in the url above is available in this page:
JODPIE
At the moment it's not the best for presentation of data on web pages but it's good for extraction on a sql database and cheaper than others.

Extracting content.rdf.u8

kunwarbs

Member

addy

Member