Using Directory Data
Also for information on the license and attribution requirements.
262 topics in this forum
-
- 2 replies
- 3.4k views
Hi I use a software called "wamp" to parse ODP into mySQL. I got total 460286 records in "content_links" table. In this table, all the links map to a category. When I do a query, it would always return one record or none. ex. SELECT * FROM content_links c WHERE c.resource = 'http://daliarchives.com/'; it would return "Top/Arts/Art_History/Artists/D/Dali,_Salvador" However, this website not only belong to one category as I know. How am I going to retrieve it's related categories? such as: Arts/Art_History/Periods_and_Movements/Surrealism and Arts/Movies/Titles/U/Un_Chien_Andalou I don't see any table contain this information(one category maps to r…
Last reply by logoin, -
-
- Editall/Catmv
- 9 replies
- 4.7k views
Hello! When I importing ODP data I have found that there are many links (related, symbolic, etc) to categories which doesn't exist in structure.rdf.u8. Some links can be resolved by looking for corresponding categoty in redirect.rdf.u8 but others are completely broken. I don't know how, but broken links are not displayed on the site (despite its presence in ODP data) . In any way, I think that having such a links in ODP data is no good, because it is not enough to use structure.rdf.u8 and content.rdf.u8 to import data. One should try to resolve links to categories through redirect.rdf.u8 and remove links if can't resolve. It is not very convenient. If it is in…
Last reply by sfromis, -
-
- 1 reply
- 3k views
Hello! I have run "fgrep '<type>' content.rdf | sort | uniq" command to find what kind of types can be ExternalPages. And I see the following: <type>Atom</type> <type>News about David Letterman, collected from various sources on the web</type> <type>News about Enrique Iglesias, collected from various sources on the web</type> <type>News about Ice Cube, collected from various sources on the web</type> <type>News about Sheryl Crow, collected from various sources on the web</type> <type>News about Steve Martin, collected from various sources on the web</type> <type>News about Tom Ha…
Last reply by sfromis, -
- 0 replies
- 2.6k views
There is a problem or its the max speed of odp download? It's impossible to download 60Mb structure file and 303Mb content file at 4k (now)
Last reply by Nightmare, -
- 0 replies
- 2.5k views
Hello! When I run "fgrep '<ages>' kt-content.rdf | sort | uniq -c" command to see what kind of <ages> can be and how many resources are per age, I have found this: 5814 <ages>kids</ages> 32 <ages>kids/mteen</ages> 5819 <ages>kids/teen</ages> 7909 <ages>kids/teen/mteen</ages> 1813 <ages>mteen</ages> 1793 <ages>teen</ages> 9412 <ages>teen/mteen</ages> It seems that <ages>kids/mteen</ages> is incorrect because if teens are elder than kids and mteens are elder than teens, then how can be that resource suitable for both kids and mteens but not for…
Last reply by IZh, -
- 0 replies
- 2.5k views
Hello! I have looking what kind of <d:charset> can be in structure.rdf and have found that some topics has UTF-8 specified and one WINDOWS-1251. Here is that topic: <Topic r:id="Top/World/Makedonski/Општество/Влада/Локална_самоуправа"> <catid>1216590</catid> <d:Title>Локална_самоуправа</d:Title> <d:charset>WINDOWS-1251</d:charset> <lastUpdate>2004-04-07 20:40:01</lastUpdate> </Topic> But where is 1251 encoding here? Everything in the dump is in UTF-8 as normal. Shouldn't all <d:charset>s be removed from dump? Thanks.
Last reply by IZh, -
- 4 replies
- 3k views
Hello! I have found that on some pages letter bars (in non-english languages) are present on web-site, e.g. http://dmoz.org/World/Russian/%d0%98%d1%81%d0%ba%d1%83%d1%81%d1%81%d1%82%d0%b2%d0%be/%d0%9c%d1%83%d0%b7%d1%8b%d0%ba%d0%b0/%d0%93%d1%80%d1%83%d0%bf%d0%bf%d1%8b_%d0%b8_%d0%b8%d1%81%d0%bf%d0%be%d0%bb%d0%bd%d0%b8%d1%82%d0%b5%d0%bb%d0%b8/ but not found in structure.rdf It it bug or feature? Are letter bars generated automatically? I thought that every letter bar should be presented in content.rdf. In what cases letter bars are described in structure.rdf? Is it depends on language? Thanks!
Last reply by IZh, -
- 4 replies
- 3.2k views
is dmoz open source ? i know i can use the data but are the source files for the comunity also free to download ?
Last reply by Jacob Mathai, -
- 1 reply
- 2.9k views
hi I downloaded rdf dump of dmoz.I downloaded two files "content.rdf.u8" and "structure.rdf.u8" and then i stored the data of these files to mysql databse.4 tables were created named structure,datatypes,content_links and content_description.now i want to use the Shopping Category but i am not understanding the purpose of these 4 tables and do not know how to use these 4 tables.plz some body tell me how can i use these tables. thanks Usman
Last reply by chvsr, -
-
- Meta
- 4 replies
- 3.1k views
Games directory is not available in DMOZ files hi I have downloaded "content.rdf.u8.gz" and "structure.rdf.u8.gz" files from dmoz.org.Actually I need Games directory.Only the structure information about "Games" is available in "structure.rdf.u8.gz" file but their links,titles and description is not available in "content.rdf.u8.gz" file.plz tell me how can i get the complete data file. thanks Usman
Last reply by windharp , -
-
- 3 replies
- 3.6k views
http://www.randomwebsite.net I just built a little custom parser in php, and i download and run through the data once a week, and load up them URLs. Its sweet.
Last reply by otr, -
- 2 replies
- 2.9k views
Hi! I observed the pattern of the sample content file (content.example) available on the OPD site. From the point of view of extracting the hyperlinks, it appears that the format of the content is such that each link is repeated exactly twice. Once in the link <r:resource tag within the <Topic>/<catid>/<link> tag AND once in the <ExternalPage> tag. Can one safely assume that this repeatition is true for the entire real dump and thereby ignore the links in either one of the two places? Thanks! Rahul.
Last reply by addy, -
- 1 reply
- 2.9k views
hi I am parsing ODP data usign rss parser.I don't know how can i use these files.because these files are too large.and I do not know what is the prupose of "structure.rdf.u8" and "content.rdf.u8" plz some body also tell me this and also how to use these files to parse.I have checked my parser for some small fiel that is working correctly. thanks Usman
Last reply by giz, -
- 1 reply
- 2.7k views
hi I want to use Games and Sports category.I have downloaded both filed "content.rdf.u8" and "structure.rdf.u8" but it take a lot of time to parse.is there any way to split these files by catergory i.e Games category in one file and Sports in other file.how can i break\split these large files. thanks Usman
Last reply by addy, -
- 1 reply
- 2.7k views
Im trying to figure out how does odf tags work? I mean what are their functions. Is there any tutorial for that or just a documentation that you know? Like for example the external tag in content.rdf, or the narrow2 and narrow3(what are their difference)? I've read some rdf tuts. i dont know if i remember it right but they say that odf does not conform to the newer recommendations of the w3c for rdfs.. (oh please just correct me on this one..) the bottom line is, i cant find any article that maybe explains what they are for and why do these tags exist. If you have some links, I would very much appreciate. Tnx.
Last reply by dane, -
- 0 replies
- 3.3k views
How to get data from DMOZ XML thru RSS feed hi I want to get dmoz xml data thru rss feed.plz some body tell me the script help.I want only initial help remaining i shall do myself.I am trying RSS parser but tha is not working. thanks Usman
Last reply by musmanm80, -
-
- Meta
- 3 replies
- 4.8k views
Hello, I try to parse the structure.rdf.u8.xml file and i discover lot of mistake in it : For exemple all the category below are not declared in the structure.rdf.u8.xml as <Topic r:id="xxx"> but they are use as child of some category declared in the structure.rdf.u8.xml file (<altlang r:resource="Afrikaans:xxx"/> for exemple) Top/Adult/Image_Galleries/Ethnic/Ebony/Softcore/AVS/Adult_Check Top/Adult/Image_Galleries/Ethnic/Ebony/Softcore/AVS/Adult_Check/Gold Top/Adult/Image_Galleries/Ethnic/Ebony/Softcore/AVS/FreeNetPass Top/Adult/Image_Galleries/Ethnic/Ebony/Softcore/AVS/SexKey Top/Arts/Music/Instruments/Stringed/Guitar/Luthiers Top/Arts/Music/Styl…
Last reply by windharp , -
-
-
- Editall
- 1 reply
- 3.1k views
hi i want to redirct from one perl script to another perl script.actually i have this scenario login.html------->checklogin.pl-------------->add_sites.pl i want to redirect from "checklogin.pl" to "add_sites.pl" plz some body tell me how i can do this.i shall be very thankful to you USman
Last reply by Callimachus , -
-
-
- Meta
- 6 replies
- 4.1k views
I have been using a cgi script to add DMOZ data to my website for a few years now, and never had a problem. A few days ago, however I noticed that the category pages were producing the following error: Couldn't connect to dmoz.org:80 : IO::Socket::INET: Timeout At first I thought it was my web host's firewall blocking the connection to DMOZ, so I contacted them and they allowed the connection in the firewall, but I still got the same error. Finally my web host investigated the problem and said that it was DMOZ blocking my IP. (By the way, I uploaded the script to another web server and it works fine.) I'm pretty sure I'm not breaking any rules or anything for DM…
Last reply by ishtar, -
-
-
- Editall
- 3 replies
- 3.1k views
Is there a script that automatically uses ODP data, or any examples to build on? I can't seem to be able to find any!
Last reply by giz, -
-
-
- Meta
- 5 replies
- 7.2k views
Hi ! I read all the information I could find (license, FAQ, other threads...) about using dmoz data but I can't seem to find an answer to the following question. I'd like to build a new directory based solely on the dmoz categories structure. I will not use any data from dmoz (site, links, description...) in this new directory. I only wish to create a new directory based on the dmoz categories structure (for example, I'll have an Art section with some sub categories, I'll not have a World nor Regional sections...). Does a directory based on the dmoz categories structure (which will be modified, simplified, etc.) need to provide the applicable dmoz attribution knowin…
Last reply by windharp , -
-
-
- Meta
- 2 replies
- 2.8k views
Do I need to include the dmoz attribution on every page of a directory, I dont have a directory yet, but am thinking of using a script. I have seen at least one site (http://www.getblogs.com) that uses mainly dmoz data (at least tucked away on a page or two it says it is supplemented with dmoz data, and when it started it was almost all dmoz....) and they do not use any of the attribution notices at all? None on the listings pages, non on the submit pages, you have to look hard to find only a small notice, but even that is not the one as required per the rules. Do they pay you for a seperate licence to not display any notices on their listings, or can you just ignor…
Last reply by windharp , -
-
-
- Meta
- 5 replies
- 3.1k views
Hello all, I am the webmaster of http://www.freetropolis.com. It's a brand new site with no current website traffic. Before I "release" it to the public, I'd like to know what would be the proper way to give credit back to the ODP. I don't use ODP website directory data, I just used the categories. ODP's categories do a fantastic job of breaking down the world wide web into an easy to use hierarchy, and I've combined search results from various search engines into a directory search engine. Please advise me on the proper way to give credit. Thank you. -Webmaster Freetropolis.com
Last reply by windharp , -
-
-
- Meta
- 5 replies
- 3.5k views
Hi, I want to use the categorized ODP data, well...actually, the web-site pages which are listed in the ODP. For this I need to download pages from the links (category-wise) in the ODP listing. I have downloaded the rdf dump from ODP web-site. The problem is that the dump is too large: 1.85 GB single file, on disk. The question is: How should I go about processing it? There are parsers but isn't the file too large? Is there a way to split the dump into categories or atleast into parts to make it more manageable? Thanks! Rahul.
Last reply by giz, -
-
-
- Meta
- 1 reply
- 3.2k views
hi actually i want to add dmoz directory to my site <removed> so plz tell me will i also need to donwload these two files mentioned above from http://rdf.dmoz.org/ plz tel me the complete procedure and if any body have script for this then tell me otherwise tell me the procedure. thanks Usman
Last reply by pvgool , -