Using Directory Data

How to get related categories?

January 3, 2006 by logoin

2 replies
3.4k views

Hi I use a software called "wamp" to parse ODP into mySQL. I got total 460286 records in "content_links" table. In this table, all the links map to a category. When I do a query, it would always return one record or none. ex. SELECT * FROM content_links c WHERE c.resource = 'http://daliarchives.com/'; it would return "Top/Arts/Art_History/Artists/D/Dali,_Salvador" However, this website not only belong to one category as I know. How am I going to retrieve it's related categories? such as: Arts/Art_History/Periods_and_Movements/Surrealism and Arts/Movies/Titles/U/Un_Chien_Andalou I don't see any table contain this information(one category maps to r…

Last reply by logoin, February 2, 2006

Dangling links to categories in ODP dump.

January 10, 2006 by IZh

- Editall/Catmv
9 replies
4.8k views

Hello! When I importing ODP data I have found that there are many links (related, symbolic, etc) to categories which doesn't exist in structure.rdf.u8. Some links can be resolved by looking for corresponding categoty in redirect.rdf.u8 but others are completely broken. I don't know how, but broken links are not displayed on the site (despite its presence in ODP data) . In any way, I think that having such a links in ODP data is no good, because it is not enough to use structure.rdf.u8 and content.rdf.u8 to import data. One should try to resolve links to categories through redirect.rdf.u8 and remove links if can't resolve. It is not very convenient. If it is in…

Last reply by sfromis, January 25, 2006

Wrong <type> in <ExternalPage>s.

December 25, 2005 by IZh

1 reply
3k views

Hello! I have run "fgrep '<type>' content.rdf | sort | uniq" command to find what kind of types can be ExternalPages. And I see the following: <type>Atom</type> <type>News about David Letterman, collected from various sources on the web</type> <type>News about Enrique Iglesias, collected from various sources on the web</type> <type>News about Ice Cube, collected from various sources on the web</type> <type>News about Sheryl Crow, collected from various sources on the web</type> <type>News about Steve Martin, collected from various sources on the web</type> <type>News about Tom Ha…

Last reply by sfromis, January 25, 2006

Download speed 3k?

January 12, 2006 by Nightmare

0 replies
2.6k views

There is a problem or its the max speed of odp download? It's impossible to download 60Mb structure file and 303Mb content file at 4k (now)

Last reply by Nightmare, January 12, 2006

Incorrect <ages>kids/mteen</ages>?

December 25, 2005 by IZh

0 replies
2.5k views

Hello! When I run "fgrep '<ages>' kt-content.rdf | sort | uniq -c" command to see what kind of <ages> can be and how many resources are per age, I have found this: 5814 <ages>kids</ages> 32 <ages>kids/mteen</ages> 5819 <ages>kids/teen</ages> 7909 <ages>kids/teen/mteen</ages> 1813 <ages>mteen</ages> 1793 <ages>teen</ages> 9412 <ages>teen/mteen</ages> It seems that <ages>kids/mteen</ages> is incorrect because if teens are elder than kids and mteens are elder than teens, then how can be that resource suitable for both kids and mteens but not for…

Last reply by IZh, December 25, 2005

Wrong WINDOWS-1251 encoding is specified for a topic.

December 25, 2005 by IZh

0 replies
2.5k views

Hello! I have looking what kind of <d:charset> can be in structure.rdf and have found that some topics has UTF-8 specified and one WINDOWS-1251. Here is that topic: <Topic r:id="Top/World/Makedonski/Општество/Влада/Локална_самоуправа"> <catid>1216590</catid> <d:Title>Локална_самоуправа</d:Title> <d:charset>WINDOWS-1251</d:charset> <lastUpdate>2004-04-07 20:40:01</lastUpdate> </Topic> But where is 1251 encoding here? Everything in the dump is in UTF-8 as normal. Shouldn't all <d:charset>s be removed from dump? Thanks.

Last reply by IZh, December 25, 2005

<letterbar> not present in structure.rdf but present on dmoz.org web-site.

December 7, 2005 by IZh

4 replies
3.1k views

Hello! I have found that on some pages letter bars (in non-english languages) are present on web-site, e.g. http://dmoz.org/World/Russian/%d0%98%d1%81%d0%ba%d1%83%d1%81%d1%81%d1%82%d0%b2%d0%be/%d0%9c%d1%83%d0%b7%d1%8b%d0%ba%d0%b0/%d0%93%d1%80%d1%83%d0%bf%d0%bf%d1%8b_%d0%b8_%d0%b8%d1%81%d0%bf%d0%be%d0%bb%d0%bd%d0%b8%d1%82%d0%b5%d0%bb%d0%b8/ but not found in structure.rdf It it bug or feature? Are letter bars generated automatically? I thought that every letter bar should be presented in content.rdf. In what cases letter bars are described in structure.rdf? Is it depends on language? Thanks!

Last reply by IZh, December 19, 2005

open source

December 2, 2005 by skynex66

4 replies
3.2k views

is dmoz open source ? i know i can use the data but are the source files for the comunity also free to download ?

Last reply by Jacob Mathai, December 3, 2005

Problem in understanding odp mysql tables

November 8, 2005 by musmanm80

1 reply
2.9k views

hi I downloaded rdf dump of dmoz.I downloaded two files "content.rdf.u8" and "structure.rdf.u8" and then i stored the data of these files to mysql databse.4 tables were created named structure,datatypes,content_links and content_description.now i want to use the Shopping Category but i am not understanding the purpose of these 4 tables and do not know how to use these 4 tables.plz some body tell me how can i use these tables. thanks Usman

Last reply by chvsr, November 20, 2005

Games directory is not available

November 14, 2005 by musmanm80

- Meta
4 replies
3.1k views

Games directory is not available in DMOZ files hi I have downloaded "content.rdf.u8.gz" and "structure.rdf.u8.gz" files from dmoz.org.Actually I need Games directory.Only the structure information about "Games" is available in "structure.rdf.u8.gz" file but their links,titles and description is not available in "content.rdf.u8.gz" file.plz tell me how can i get the complete data file. thanks Usman

Last reply by windharp

, November 14, 2005

I Use DMOZ data on This Site

September 19, 2005 by l008com2

3 replies
3.6k views

http://www.randomwebsite.net I just built a little custom parser in php, and i download and run through the data once a week, and load up them URLs. Its sweet.

Last reply by otr, November 8, 2005

RDF Content Format

September 8, 2005 by rj365

2 replies
3k views

Hi! I observed the pattern of the sample content file (content.example) available on the OPD site. From the point of view of extracting the hyperlinks, it appears that the format of the content is such that each link is repeated exactly twice. Once in the link <r:resource tag within the <Topic>/<catid>/<link> tag AND once in the <ExternalPage> tag. Can one safely assume that this repeatition is true for the entire real dump and thereby ignore the links in either one of the two places? Thanks! Rahul.

Last reply by addy, November 1, 2005

how to use odp contents files "contents.rdf.u8" and "structure.rdf.u8"

October 21, 2005 by musmanm80

1 reply
2.9k views

hi I am parsing ODP data usign rss parser.I don't know how can i use these files.because these files are too large.and I do not know what is the prupose of "structure.rdf.u8" and "content.rdf.u8" plz some body also tell me this and also how to use these files to parse.I have checked my parser for some small fiel that is working correctly. thanks Usman

Last reply by giz, October 31, 2005

how to split content.rdf.u8 and structure.rdf.u8

October 31, 2005 by musmanm80

1 reply
2.7k views

hi I want to use Games and Sports category.I have downloaded both filed "content.rdf.u8" and "structure.rdf.u8" but it take a lot of time to parse.is there any way to split these files by catergory i.e Games category in one file and Sports in other file.how can i break\split these large files. thanks Usman

Last reply by addy, October 31, 2005

Rdf Tags, what do they mean?

October 21, 2005 by dane

1 reply
2.7k views

Im trying to figure out how does odf tags work? I mean what are their functions. Is there any tutorial for that or just a documentation that you know? Like for example the external tag in content.rdf, or the narrow2 and narrow3(what are their difference)? I've read some rdf tuts. i dont know if i remember it right but they say that odf does not conform to the newer recommendations of the w3c for rdfs.. (oh please just correct me on this one..) the bottom line is, i cant find any article that maybe explains what they are for and why do these tags exist. If you have some links, I would very much appreciate. Tnx.

Last reply by dane, October 21, 2005

How to USe DMOZ Data

October 13, 2005 by musmanm80

0 replies
3.3k views

How to get data from DMOZ XML thru RSS feed hi I want to get dmoz xml data thru rss feed.plz some body tell me the script help.I want only initial help remaining i shall do myself.I am trying RSS parser but tha is not working. thanks Usman

Last reply by musmanm80, October 13, 2005

ODP structure.rdf.u8.xml is incorrect

October 6, 2005 by arkadia

- Meta
3 replies
4.8k views

Hello, I try to parse the structure.rdf.u8.xml file and i discover lot of mistake in it : For exemple all the category below are not declared in the structure.rdf.u8.xml as <Topic r:id="xxx"> but they are use as child of some category declared in the structure.rdf.u8.xml file (<altlang r:resource="Afrikaans:xxx"/> for exemple) Top/Adult/Image_Galleries/Ethnic/Ebony/Softcore/AVS/Adult_Check Top/Adult/Image_Galleries/Ethnic/Ebony/Softcore/AVS/Adult_Check/Gold Top/Adult/Image_Galleries/Ethnic/Ebony/Softcore/AVS/FreeNetPass Top/Adult/Image_Galleries/Ethnic/Ebony/Softcore/AVS/SexKey Top/Arts/Music/Instruments/Stringed/Guitar/Luthiers Top/Arts/Music/Styl…

Last reply by windharp

, October 6, 2005

how to redirect in perl

September 26, 2005 by musmanm80

- Editall
1 reply
3.2k views

hi i want to redirct from one perl script to another perl script.actually i have this scenario login.html------->checklogin.pl-------------->add_sites.pl i want to redirect from "checklogin.pl" to "add_sites.pl" plz some body tell me how i can do this.i shall be very thankful to you USman

Last reply by Callimachus

, September 27, 2005

My IP Blocked By DMOZ

September 2, 2005 by pfasia

- Meta
6 replies
4.1k views

I have been using a cgi script to add DMOZ data to my website for a few years now, and never had a problem. A few days ago, however I noticed that the category pages were producing the following error: Couldn't connect to dmoz.org:80 : IO::Socket::INET: Timeout At first I thought it was my web host's firewall blocking the connection to DMOZ, so I contacted them and they allowed the connection in the firewall, but I still got the same error. Finally my web host investigated the problem and said that it was DMOZ blocking my IP. (By the way, I uploaded the script to another web server and it works fine.) I'm pretty sure I'm not breaking any rules or anything for DM…

Last reply by ishtar, September 22, 2005

ODP Data Script

September 19, 2005 by snareklutz

- Editall
3 replies
3.2k views

Is there a script that automatically uses ODP data, or any examples to build on? I can't seem to be able to find any!

Last reply by giz, September 19, 2005

Directory based on dmoz categories structure

August 3, 2005 by kaos25

- Meta
5 replies
7.2k views

Hi ! I read all the information I could find (license, FAQ, other threads...) about using dmoz data but I can't seem to find an answer to the following question. I'd like to build a new directory based solely on the dmoz categories structure. I will not use any data from dmoz (site, links, description...) in this new directory. I only wish to create a new directory based on the dmoz categories structure (for example, I'll have an Art section with some sub categories, I'll not have a World nor Regional sections...). Does a directory based on the dmoz categories structure (which will be modified, simplified, etc.) need to provide the applicable dmoz attribution knowin…

Last reply by windharp

, September 15, 2005

Attributions, do they need to be shown, they dont seem to be sometimes.

September 15, 2005 by Jeffery2

- Meta
2 replies
2.8k views

Do I need to include the dmoz attribution on every page of a directory, I dont have a directory yet, but am thinking of using a script. I have seen at least one site (http://www.getblogs.com) that uses mainly dmoz data (at least tucked away on a page or two it says it is supplemented with dmoz data, and when it started it was almost all dmoz....) and they do not use any of the attribution notices at all? None on the listings pages, non on the submit pages, you have to look hard to find only a small notice, but even that is not the one as required per the rules. Do they pay you for a seperate licence to not display any notices on their listings, or can you just ignor…

Last reply by windharp

, September 15, 2005

Proper credit to the ODP

July 8, 2005 by freetropolis

- Meta
5 replies
3.1k views

Hello all, I am the webmaster of http://www.freetropolis.com. It's a brand new site with no current website traffic. Before I "release" it to the public, I'd like to know what would be the proper way to give credit back to the ODP. I don't use ODP website directory data, I just used the categories. ODP's categories do a fantastic job of breaking down the world wide web into an easy to use hierarchy, and I've combined search results from various search engines into a directory search engine. Please advise me on the proper way to give credit. Thank you. -Webmaster Freetropolis.com

Last reply by windharp

, September 15, 2005

How to Use ODP Data for Research

September 5, 2005 by rj365

- Meta
5 replies
3.5k views

Hi, I want to use the categorized ODP data, well...actually, the web-site pages which are listed in the ODP. For this I need to download pages from the links (category-wise) in the ODP listing. I have downloaded the rdf dump from ODP web-site. The problem is that the dump is too large: 1.85 GB single file, on disk. The question is: How should I go about processing it? There are parsers but isn't the file too large? Is there a way to split the dump into categories or atleast into parts to make it more manageable? Thanks! Rahul.

Last reply by giz, September 10, 2005

will i also need to dowonload "structure.rdf.u8.gz" and content.rdf.u8.gz

September 9, 2005 by musmanm80

- Meta
1 reply
3.2k views

hi actually i want to add dmoz directory to my site <removed> so plz tell me will i also need to donwload these two files mentioned above from http://rdf.dmoz.org/ plz tel me the complete procedure and if any body have script for this then tell me otherwise tell me the procedure. thanks Usman

Last reply by pvgool

, September 9, 2005

Sign In

Using Directory Data

262 topics in this forum

How to get related categories?

Dangling links to categories in ODP dump.

Wrong <type> in <ExternalPage>s.

Download speed 3k?

Incorrect <ages>kids/mteen</ages>?

Wrong WINDOWS-1251 encoding is specified for a topic.

<letterbar> not present in structure.rdf but present on dmoz.org web-site.

open source

Problem in understanding odp mysql tables

Games directory is not available

I Use DMOZ data on This Site

RDF Content Format

how to use odp contents files "contents.rdf.u8" and "structure.rdf.u8"

how to split content.rdf.u8 and structure.rdf.u8

Rdf Tags, what do they mean?

How to USe DMOZ Data

ODP structure.rdf.u8.xml is incorrect

how to redirect in perl

My IP Blocked By DMOZ

ODP Data Script

Directory based on dmoz categories structure

Attributions, do they need to be shown, they dont seem to be sometimes.

Proper credit to the ODP

How to Use ODP Data for Research

will i also need to dowonload "structure.rdf.u8.gz" and content.rdf.u8.gz

Browse

Activity