Using Directory Data
Also for information on the license and attribution requirements.
262 topics in this forum
- Meta
- 6 replies
The current dump (13 Oct 2006) predates the ODP crash. Now that things are mostly up and running, could a new dump be generated and posted? Best regards.
Last reply by jpnutch, -
- Meta
- 3 replies
What is the situation if you run a directory that uses 100% user-submitted listings and people submit listings that are word-for-word from ODP? I know when I'm doing SEO I submit the same name and description to every directory so that could cause some duplication too. Thanks.
Last reply by sfromis, -
- 1 reply
Hello, I came across a network of directory sites that have several listings that are word for word the same as DMOZ listings. The duplicate listings are mixed in with other listings and no credit is given to DMOZ. I know sometimes site owners will submit their sites to directories using the same description so maybe this is what has happened. And I know you are not supposed to copy the whole directory without giving credit to DMOZ, but does DMOZ care if just a few listings are copied? And if they do care, at what point would a site cross the line? Also, if this is unnacceptable then where would a person go to make a report?
Last reply by jimnoble, -
- Meta
- 6 replies
Hi, I'd like to put DMOZ data into directory style eBooks so people could save time looking for the information on the web. I have credited DMOZ on the last page but I would like to know is this allowed? Here's a sample: From my understanding of the license it looks like it's possible. Thank you for your time, Reed Floren
Last reply by sally, -
- Editall/Catmv
- 1 reply
i whill add my site in to DMOZ can i do?
Last reply by makrhod,
- 5 replies
The RDF Dump (or at least the provided example) seems to adhere to some very old draft specification of RDF/XML. Are there plans to fix it to adhere to the final spec so that it can be read by standards RDF processors? The main problems seem to be a wrong namespace, the unqualified use of rdf:about and a missing namespace prefix on the root element (which thus appears in the dmoz-ns rather than rdf). Cheers, Reto References W3C validators result with DMOZ example content: http : // www . w3 . org/RDF/Validator/ARPServlet? …
Last reply by sfromis, -
- 1 reply
Isn't the BCZ directory listed in the following NOT conforming to the licence attribution requirements?
Last reply by chaos127,
- 2 replies
hi everyone i have used a dmoz2mysql parser on content.rdf(1.94 GB) and structure.rdf(630 MB).it fill the tables structure(8,020) records ,datatypes(41,980) records,content_links(2,721,631) records and content_description(2,728,369)records. now my question is that is this parser fill these above tables with all records means fills all the dmoz records.if any one may use this parser before plz tell their comparsion records with mine
Last reply by sfromis, -
- Meta
- 3 replies
In the following links I see that there is a seperation of regional info (like Arreton, Bembridge, ), from the subject info (like Arts&Ents, Business&Econ, ), but I can't see where such things are "marked" in the dmoz data. Which part of the data marks these things? Thanks.
Last reply by sfromis, -
- Meta
- 5 replies
Hi guys, Thanks for this forum, I downloaded and parsed ODP into mySQL database easily. The next step is to make use of it. I basicly need to know the category of a website. (enter address and return it's category) I'm thinking to divide it into several tables according to their first letter. ex. I'll only look for the table "a" to save the search time through whole database. Does anyone did this before or have better idea on this? and how to do it? I'm totally new to database. I would really appreciate if I can get good advises. Thank you very much.
Last reply by pvgool,
- Meta
- 4 replies
Hi, I would like to download the data related to only "Recreation" catagory. How can I do this? The download link has the data of more than 300 MB for all catagory. I need only "Recreation". How can I do it? Thanks, JK
Last reply by sfromis, -
- RZ Admin
- 1 reply
Hello, I downloaded the rdf dumps for the content and structure of I have a few questions and need help on how to use these. What I'm trying to do is put the data into my web directory but I do not know how to even use these files. I need to convert the files into MySQL schema (content.rdf.u8.sql) so I can insert the data into my database and I do not know how I can do this or if it is even possible. I really need help. Thank you and I look forward to any replies I get.
Last reply by photofox,
- Editall/Catmv
- 2 replies
Given a domain name, I'm trying to extract a site description from the content data. When the domain is listed under multiple ExternalPage elements, how do I identify a "primary" one? For example, has three: <ExternalPage about=""> <d:Title>Yahoo!</d:Title> <d:Description>Yahoo!'s webservers exclusively run FreeBSD. In addition, all the non-production servers and developer workstations run FreeBSD.</d:Description> <priority>1</priority> <topic>Top/Computers/Software/Operating_Systems/Unix/BSD/FreeBSD/Prominent_Users</topic> </ExternalPage> <ExternalPage about="ht…
Last reply by motsa,
- Meta
- Editall
- Editall/Catmv
- 45 replies
hi, www-threeauthors-com this is my site. Two months back , i have submitted this site to both google , yahoo. Now Yahoo is listing my site. But google is not. Why ? Help me thank u -geetha
Last reply by ybailwal, -
- 3 replies
SmartODP is a script write in PHP+Smarty, easy to install, no database needed. free for personal use, visit to find more.
Last reply by Sachti,
- Meta
- 4 replies
Hi, I noticed that some of the listings on the dmoz site have a little "star" next to them - check for an example. Is this data (whether a specific title/description listing is marked as "editor's choice" or not) available in the RDF dumps? Any guidance is appreciated! Thanks!
Last reply by sfromis, -
- 3 replies
Where can I find existing parsers for ODP RDF data? In particular I am interested in Java RDF parsers.
Last reply by shabda, -
- Meta
- 1 reply
Hi everyone! I was just interested to know what the most popular way the DMOZ directory is used? Is it users actually conducting searches/browsing from within, or is it Google (and other search enginges) using the directory somehow as a base in their own queries or something else! Personally, I have used it, because the directory structure is obviously more structured than query results from a search engines. For the purposes of tying this question into the correct topic, I have just applied the other day for an editor position. My aim is to not come back as a sore loser if/once I get rejected:) Rejection is always hard to take... Keep …
Last reply by windharp,
I found on this forum about RDF Dumps. Does anyone know how to download these and how to interperit them?
Last reply by photofox,
- 9 replies
Hello! It's me again. ;-) Here is my review what you can encounter in <mediadate> tag in content.rdf.u8 ODP dump file. As I understand, the desired content for that tag is date in the format "YYYY.MM.DD". First of all is separators. The desired separator is dot but I have found many dates with '-', '/', ',', '_' and ' ' characters. I grab with sed programm all of contents of <mediadate> tags from dump and use simple PHP function to check if a date is bad: function is_good_date ($d) { if (!ereg ("^([0-9]{4})[-./,_ ]([0-9]{1,2})[-./,_ ]([0-9]{1,2})$", $d, $regs) || $regs[1] > 2006 || $regs[1] < 1000 || !checkdate ($regs[2], $regs[3], $regs[1…
Last reply by sfromis, -
- 1 reply
Hello everybody, I've downloaded the entire DMOZ datas and parsed them with dmoz2mysql PHP script. I've added IDs on tables. From now, when I do requests like this one, it's really slow (60sec) the first time it's done (thanks to MySQL cache) SELECT DISTINCT `content_links`.`topic`, `content_links`.`resource`, `content_description`.`title`, `content_description`.`description` FROM `content_links` , `content_description` WHERE `content_links`.`topic` LIKE "Top/Shopping/Publications/Books/Arts" AND `content_description`.`externalpage` = `content_links`.`resource` Is there a way to optimize my query/tables? Thanks a lot...
Last reply by Jacob Mathai, -
- 1 reply
Hello! I have found that several newsGroup tags pointing to http:// resources, so you will find links like "news:http://...". I don't think that it is correct. Some of these resources are web-interfaces to news groups, others internal dmoz pages that cannot be accessed without password and one is a regular dmoz category. I think that newsGroup tags should point only to Usenet resources. And in any case "news:http://..." is not valid link. Here is incorrect links: <Topic r:id="Top/Arts/Music/Bands_and_Artists/G/Gilbert,_Kevin"> ... <newsGroup r:resource="news:"/> <newsGroup r:resource="news:http://launch.…
Last reply by motsa,
- 2 replies
Hi, I just downloaded the structure and contents RDF dumps, but I can't seem to find any documentation anywhere on tags such as narrow, narrow1, narrow2, etc. Is there a schema somewhere for the tags used? If there is no schema, or at minimum some documentation, then I would be interested in building such documentation with the community's help. XML is only useful with described semantics. Thanks, Kevin
Last reply by ishtar, -
- Editall/Catmv
- 4 replies
This site is using dmoz without proper attribution
Last reply by mikeb1, -
- Meta
- 4 replies
First, I want to take a sec to thank you all for volunteering your time for this. I'm pretty surprised and disappointed to see so few seem to appreciate your efforts. My question is this. When I view my backward links from Google, my listing in dmoz does not show up. Why would this be? I have been listed for a few months now, so it should show up. If it helps, my web site address is Thank you for your help.
Last reply by brmehlman, -