Using Directory Data
Also for information on the license and attribution requirements.
262 topics in this forum
Use of ODP data
by Guest BigMatthew- 4 replies
I am some what clear on the data use policy, but I would like to clarify. I'm working on a directory that would include sections of ODP, for instace the whole section for Arts, News and Science. Starting from the top level down. This is only partial ODP data but full categories. Is this use acceptable, give that the ODP attribution of adding the supplied html code is followed? Also, I will have a feature that will let users add their own listings to my directory, so there would be ODP listings and my own listings submitted. Is this acceptable? Thank you for your help.
Last reply by nakulgoyal, -
- Meta
- 7 replies
I want to be able to download the current Dmoz data (or certain sections) and then edit that as my own directory. Am I allowed to do this? I have PHP and MySql and about 3gb of space I have had a look at these sites but I don't know which one does what I want Basically I want to download the data, upload to my server, then allow people to submit links and for me to edit the directory. Thanks
Last reply by nakulgoyal, -
- Meta
- 3 replies
Hmmmmmm, whenever I attempt to download (Save As..) both the 'structure' and 'content' .tz files, they don't stop at their said sizes....... Currently, the 'structure' file went to 980mb before I gave up. They cant be that big surely? How do I download these files 100% please.
Last reply by theseeker,
- 0 replies
Could guru who has grown up with perl be so kind as to tell a regular expression that would filter out every site from an rdf dump that does not start with the requested url pattern, e.g. Top/Arts/Music/Metal in order to be able to build a directory which is a subtree of the whole big dump. It probably just needs to evaluate everythign betwenn the start "<" and before the beginning of the next "<" and look at everything that starts with "Top" and ends with " and replace that by and empty string if it is not mtached.
Last reply by dermotz, -
ODP Attribution
by Guest BigMatthew- 1 reply
My question for the attribution code is this, my categorys will not have the same directory structure as the ODP nor will they probably have the same category names. For instance; my category might be called Arts and Entertainment, but ODP is just called Arts, I have a script that will put in my category in the $cat field of the ODP for me according to what my categories are called. I would end up with which would not send a visitor to ODP's submit a link to the Arts category, instead there is a redirect that sends them to /add.html. My question is, would it be acceptable on the Submit a Link to just have it go …
Last reply by samiam, -
- Meta
- 1 reply
Hello. I am new to this ODP area, and just recently heard about dmoz. It is obvious from reading all of the posts that people REALLY want to be listed on dmoz. What I am trying to locate (and cannot seem to find) is information on what being listed on dmoz actually does. What are the benefits? Who uses the listings? Can you provide some input or direct me to someplace that may help me understand? Thank you!
Last reply by pvgool,
breaking up category dump into manageable sub-cats
by Guest sperugin- 0 replies
Hello, Has anyone had any success breaking up the category RDF dump into sub-categories? The dump is just too big to work with directly. I need to extract certain information from the category dump, but the PHP script I use to do so is taking too long to run (~10 hours). Therefore, I wrote the following XSLT transformation to break the dump into sub-branches (e.g., Arts, Games, News) prior to running my PHP script. <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="" xmlns:r="" xmlns:d="" xmlns="" version="1.0"> <xsl:output met…
search into ODP data - other solutions than live?
by Guest Awb- 1 reply
Anybody had any significant success using ODP data into another db than MySQL? I know that most people here done it via live search but I was wondering if this has any performance using another database without huge costs. Or maybe some noticeable speed improvements on MySQL?
Last reply by hmf, -
How to improve my site to get it listed?
by Guest jimcat- 7 replies
Is there any things wrong so that site cannot be listed in ODP? How to improve it? Pls. advise.
Last reply by giz, -
which software used to "make"
by Guest --
- Meta
- 7 replies
Hi, which software is used to "make" the site (the directory part - based on the RDF data) ? is it downloadable somewhere ? regards sws
Last reply by donaldb, -
Linking categories ...
by Guest Awb-
- Meta
- 6 replies
Importing the categories into mysql from the RDF dump, I encountered the following problem : the category Top/Arts/Movies/Titles/P/Piano_Teacher,_The has listed as alternate language Top/World/Italiano/Arte/Cinema/Titoli/P/Pianista,_La which doesn't exist looking on the dmoz site, at the page,_The/ the italian link points to,_La/ as it should be, but clicking it takes us to which also exists in the RDF dump, but i couldn't find any relation between Pianista,_La and La_Pianista in th…
Last reply by hmf, -
Site Map of Developer Resources
by Guest tohotom-
- Meta
- 9 replies
Forgive me if I am not seeing the forest for the trees - but I was wondering if there was a Site Map of the developer resources... I'm not talking about a site map to the directory structure - rather a single location one might look to access the various documents related to development stored on the DMOZ servers... I have found many of the links spread out over several pages e.g. or or etc. - but I wanted to make sure that I wasn't missing any. Also I was wondering if there is any documentation on the flat file system used by dmoz. I'm fairly certain that the RDF is…
Last reply by theseeker,
collecting sequences of hyperlink labels
by Guest sperugin- 1 reply
Hello, I am interested in collecting sequences of hyperlink labels (the text anchoring the ahref) for various sub-branches of ODP. These sequences are essentially the breadcrumbs at the top of each page. The URL also happens to mirror the current sequence. For example, the News sub-branch contains the following selected sequences: News: Breaking News: Business and Economy ... News: Breaking News: Official Press Releases ... ... ... I however want to preserve crosslinks from one branch of the directory to another, which the breadcrumb and URL do not do. I'm interested in collecting such sequences on the order of thousands in selected sub-branche…
Last reply by senox, -
using ODP data
by Guest Awb-
- Meta
- 7 replies
Hello everybody I am trying to put up a website in which ODP data is to be used. I know that credits are to be given for using the data in commercial purposes. Could anyone please point me to the right direction about what is to be found into the dump file? I know I will have to convert it for filling up a MySQL database, but I need a few guidelines about it. 1. What content is in the file, the entire ODP content? 2. How can I make future updates, do I have to take the entire file each time? 3. How often a new file is available? 4. What type of database can work with it? Is MySQL powerful enough? Thanks in advance for any suggestions!
- Meta
- 10 replies
I'm interested in making my own dmoz page from the ground up. I have no programming experience whatsoever but I'm very experienced with HTML. I use dreamweaver for 1 of my site I started. but now I also want to make a directory similar to dmoz. would it be possible to make a site similar to dmoz just with regular html (making each category & sub category page by page)? or is it simply impossible to make something of that scale? Thanks.
Last reply by giz, -
Some questions re access via ASP
by Guest Dare2-
- Meta
- 9 replies
Just realised what the project is, and I am rapt that this exists. Great idea. Okay, first the blush :o because some of this is probably obvious to all but me. I would like to offer directory listings on some of my sites. I understand that if I follow the terms of use and provide the correct attributions, etc, this is acceptable. Now the Qs. 1: Can I pull info from the open directory using XMLHTTP object with ASP, or is it required that the RDFs are downloaded on a regular basis? 2: If I can pull the pages, do I need to pull the html, or is there a way to just get the data, in XML or even in a simple (eg comma-delimited list) format? Thanks in advance.
All sites that use live ODP data are down now...
by Guest browser007-
- Meta
- 8 replies
... while you can easily access itself. What's happening? Are we not authorized to use odp data anymore? You can do the test by yourself, go to: and pick a site of your choice: unless they cache some pages on their own server, there's no information from
Purchase or lease ODP data feed?
by Guest jjohnstn-
- Meta
- 6 replies
Is it possibly to purchase or lease the ODP data feed. Not live, but generated from the RDF dump? Ideally, this would have all categories including non-English ones in a high-speed feed that could be incorporated into a web site.
Last reply by giz, -
- 1 reply
The terms.rdf.u8 file supplied in the latest RDF dump is inccorect - basically it lists every term under every category a la: (repeat for each category. Angled brackets changed to braces for display purposes)
Last reply by giz, -
Blocked from using ODP data?
by Guest jjohnstn-
- Meta
- 6 replies
Overnight I found my server may have been blocked from using ODP data. Script does not work on my server, but works on a different server using a different IP address. I'm using phpODP and accessing the data live. I have all the proper attributes in place as explained on their "Using ODP data page". Is anyone else experiencing this problem? What should I do?
Last reply by hutcheson,
Just a little question..
by Guest schoik- 1 reply
I'm wondering, how many other ODP sites other than DMOZ do search engines really pull off of...? I mean it'd be a great project to make your own search engine like google and have a odp w/ it. That would be nice! But, just to make a standard ODP I'd suggest peering with major search engines, or peer with a ISP/Backbone and have their users support the directory by using their isp's searchengine and have the directory read off of it, then you would have a actual replica where people would add so that they could be on their isp's search engine. Ah, the insane thoughts and questions I come up with.. - Chris Schoik
Last reply by totalxsive, -
- 2 replies
Im setting up a directory over all shopping sites in one specific region and I want this site to synchronize with ODP. But I have no coding-skills at all and dont really know how to solve this. I want to extract just this specific data over these regional shopping sites and import them into my existing mysql-database (with my existing tabel formats). Any suggestions?
Last reply by giz, -
RDF dump sizes
by Guest --
- Meta
- 7 replies
I was just wondering whether the staff are aware of the misclaimed sizes of the RDF dump. For example, the main content at the last update was listed as 179Mb. After downloading it, and waiting a very, very long time, it turns out that the file is in fact 931Mb. Quite a significant difference. In the past I have killed the RDF download under the assumption that the connection was simply broken (it was reporting 400Mb downloaded, even though only 160Mb were supposed to be there). That is when the connections didn't time out themselves, which seems to happen fairly often (I have a cable connection). As an aside, the first time that I successfully downloaded the RDF du…
Last reply by giz, -
Any tutorials for using the RDF?
by Guest wotg- 1 reply
Hi there. About a week ago I spent a few hours trying to take a slice out of the ODP RDF for a little site of mine, Vancouver Computer Shops. I failed miserably, I'm afraid. My question is this: is anyone aware of a step by step tutorial (suitable to an idiot like me) that describes the process of taking a small section of the ODP and putting it in a db? (Double points if it includes some php or other code to then render it. Triple points for a free solution.) I figure that the easiest solution would be to hot link to one of the ODP mirrors, but that just doesn't seem fair to the ODP... besides, I'd like to customize the display a bit, and maybe add my o…
Last reply by giz, -
catids -> tree addresses (eg 1.4.7....)
by Guest DrBradford- 3 replies
Hello, I am a researcher at Bradford University. Here's my question. I map people's bookmarks to DMOZ to do some comparisons. Currently I map urls to DMOZ catids like 468769. (example here Top/Kids_and_Teens/Pre-School) However, what I would like to do is transform those ids into tree addresses, e.g. 1.4.7. (I made this one up, but it could stand for Top/Kids_and_Teens/Pre-School). I would like to do this to get a better idea of 'parent' / 'children' relationships in classes and peoples' tastes. What would be the easiest way of doing that? Has this been done before? I was thinking of reading out the catname like Top/Kids_and_Teens/Pre-School, but it would…
Last reply by giz,