Using Directory Data
Also for information on the license and attribution requirements.
262 topics in this forum
-
-
- Meta
- RZ Admin
- 4 replies
- 4.3k views
Hi, I've noticed that there is a new updated set of ODP RDF dumps @ http://rdf.dmoz.org/rdf/ as of 18-Feb-2011 17:07 I've found a number of sites that can be found by using the search feature on DMOZ, yet they are not in the most recent RDF files. Additionally, when browsing to the category returned by 'search', they are not listed. It appears that the category listings are up to date, yet the search database function is using outdated data. ex: (hope this is not against rules of posting link, as I'm posting an internal DMOZ link...) http://www.dmoz.org/search?q=u:torchlit.net returns 1 result. Yet the category: http://www.dmoz.org/Games/Video_Games/Roleplaying/T…
Last reply by windharp , -
-
-
- RZ Admin
- 5 replies
- 4.2k views
Hi, Looking at http://rdf.dmoz.org/rdf/ , it seems that the RDF data have not been updated since October 25. It this intentional ? When can we expect to see the next update ? Thank you. Jean-Luc
Last reply by photofox , -
-
-
- Meta
- 3 replies
- 3.6k views
I have a project where I'd like to have one category (about 2300 links) in a CSV file, where I'll make a program to search the CSV. I've been searching for days, I don't know how to use perl or java which most of the tools are made, and there's no documentation to say if I can parse just one category with them. I found Extreme dmoz extractor, which looked great but is depreciated. What other options are there? Cheers, Bob
Last reply by thebob, -
-
-
- Meta
- RZ Admin
- 3 replies
- 6.9k views
I wrote a python program that's parsing structure.rf.u8 file, as surprising as it is, python haven't crashed on any string operation as it should without forcing encoding on them. Looking further into that I discovered that there are none non-ascii characters in that data dump, and there are over 50 thousands records (out of over 760000) that are supposed to have some unicode characters (usually national letters, accents) but have question marks instead. At first I though that there were some issues with converting that data by my python program, but then I looked into structure.rdf.u8 data dump, and found question marks there as well. Here are few examples of how tha…
Last reply by photofox , -
-
-
- RZ Admin
- 9 replies
- 7.3k views
Hi, I downloaded http://rdf.dmoz.org/rdf/structure.rdf.u8.gz and http://rdf.dmoz.org/rdf/categories.txt (and other files that contain DMOZ categories), but all foreign characters are replaced by one or two question marks. Here is an example of what I get : <altlang r:resource="French:Top/World/Fran??ais/Arts/Audiovisuel/Animation"></altlang> where I expect <altlang r:resource="French:Top/World/Français/Arts/Audiovisuel/Animation"></altlang> I inspected the binary content of the file and it really contains hexadecimal 3F where there is a question mark. So I guess this is not a matter of encoding method. This problem does not exist wit…
Last reply by JeanLucDmoz, -
-
-
- RZ Admin
- 5 replies
- 7.8k views
Hi all , I'm doing a mini-project on automated URL classification.. For this , I'd like to obtain about 10000 URLs from Dmoz ODP , which are uniformly distributed in all categories.. Is there any way to do this , please help . Thanks in advance !
Last reply by Inspirovations, -
-
-
- Meta
- 6 replies
- 5.6k views
Who has rights to the ftp links on Dmoz. I am refering to the hiperlinks only and whould it be ok if a person was to make a vb meta crawler to find the links and use it in a program. Because i just got some game development soft ware and i was thinking about a 3d RPG DMOZ search engine and I was wonder if that would be cool.
Last reply by DarkKilo, -
-
-
- Editall
- 3 replies
- 4.1k views
Some times back I wrote article in DigitalPoint forum in ODP section asking the same question "RDF data, who owns it?" http://forums.digitalpoint.com/index.php?showtopic=1769908 I could get any sensible answers from ODP Editors in DP, so I think bringing this subject to horse's mouth may shade the light and give us the answer. Here is my article According to DMOZ management they do because their phony ToU said so: It would be fine if you submitted your site to DMOZ and requested inclusion but what if your site one of thousands sites added to DMOZ by editors who are crawling Internet for good sites and added your site to DMOZ Index without your consent?…
Last reply by jimnoble, -
-
- 1 reply
- 3.3k views
I am interested in including DMOZ data into a website I have. However, I think it would be best if I only included categories relevant to the site's topic. Is it possible to use specific DMOZ catgeories when using DMOZ data? Thank you.
Last reply by jmaresca, -
-
- Meta
- 6 replies
- 4.4k views
Just checked the "content.rdf.u8.gz", 19-Nov-2009 09:01 294M Still contains geocities links. But dmoz.org has the links removed already.
Last reply by tszming, -
-
-
- Meta
- 1 reply
- 3.3k views
Hi, Is there anyway - through which one could use DMOZ data on google's blogspot site using Google API. Thanks.
Last reply by pvgool , -
-
-
- Meta
- 5 replies
- 5.4k views
Dear Editors, I am currently working on a WiderNet project (http://www.widernet.org), and we are having a eGranary digital library, which containes many whole websites (http://www.widernet.org/digitalLibrary/content/WhatsInside.asp). We need to gather the metadata for our websites, and we think DMOZ might have them in their database. So, is it the RDF that we need to download? What kinds of information is in it? Is there a sample record for a single item? What are the tools that are available for tranforming the data file into SQL files? Thank you!
Last reply by weddingeye, -
-
-
- Meta
- 3 replies
- 4.7k views
Hello, I need to have all the categories in memory, saved into a tree (a node = a category, and a node's child = subcategory). I try to parse de xml file, but it has a lot of errors I think, because my parser fails. I succesfully parse the Kids & Teens structure but the complete structure fails. Any help?
Last reply by sharanyan, -
-
-
- Meta
- 3 replies
- 4.3k views
Hi All, I need to download all ODP pages that contain links to english sites. After reading about ODP, I thought that I just need to download all pages except from Wold/* and Kids_And_Teens/International/* and then extract links to all english sites from those pages. However, I found many exceptions such as http://www.dmoz.org/Business/Business_Services/Communications/Translation/Single_Language/Slovenian_and_English/ This path is neither in World directory nor in Kids_And_Teens/International but still when I click on "Tomaž Metelko" on that page, I reach a non-english site. Please let me know if I am doing something wrong. Also, please let me know if…
Last reply by informator , -
-
-
- Meta
- Editall/Catmv
- RZ Admin
- 13 replies
- 6.1k views
For doing a fine job. We host the ODP data on our site with some value adds. The data is always readily available and just plain works. A ton of info at dmoz somehow manages to get compiled regularly and presented to the world. We appreciate all the hard work you do. Now if we could just get one of our six suggestions in the dmoz we'd be smitten The Faxo Team <url removed>
Last reply by makrhod , -
-
-
- Meta
- 9 replies
- 9.7k views
I hope that specifics like this are ok for this forum. I made a relatively simple PHP script for parsing XML files. I'm trying to parse the ChefMoz data into a mysql database. But alas, it doesn't work. Not matter how many ways I try to clean the data before passing it to the xml_parser, I still get invalid character and other errors, well before I make it more than 10% of the way through the dump. I've tried everything I could think of, and everything I've been able to find on the web, with really very little success. I'm hoping someone in here can help me get past this hurdle. I can post my script if you want to see it?
Last reply by Fluesse09, -
-
-
- Meta
- 6 replies
- 5.6k views
Is there someplace on DMOZ that I can see the amount of traffic it recives?
Last reply by hutcheson , -
-
-
- Meta
- RZ Admin
- 23 replies
- 11.5k views
Last date of moved in the archive of ODP database according http://rdf.dmoz.org/rdf/archive/ is 2009-04-07. Last modification date of ODP database according http://rdf.dmoz.org/rdf is 2009-04-15. All previous months these updates was in begin of each month. Why so big delay for may 2009? Is somebody know when next update of ODP will happens?
Last reply by pvgool , -
-
- 2 replies
- 3.9k views
The following malformed data is in the structure file. (The altlang and lastUpdate tags should not appear outside enclosing Topic tags. The </Topic> has no preceding <Topic> tag.) To whom should I report this? <Alias r:id="Cani:Top/World/Italiano/Società/Temi_e_Dibattiti/Bese:Top/World/Japanese/社会/問題・争点/犯罪と司法"/> <altlang r:resource="Svedese:Top/World/Svenska/Samhälle/Debattämnen/Brott_och_straff"/> <lastUpdate>2008-07-11 06:34:15</lastUpdate> </Topic>
Last reply by chaos127 , -
-
- Meta
- 3 replies
- 4.5k views
Hello, I had a look at the About page and I was wondering how many categories and subcategories there are on DMOZ...I didn't find the information in sticky topics here either. I am currently running a script that retrieves categories from the RDF file and it currently extracted more than 30,000 categories...It's crazy How many are there? Thanks,
Last reply by hansfn , -
-
-
- Meta
- 4 replies
- 4.6k views
Hi There, I'm not sure if this is the right place but I thought I'd mention it anyway. My import routine for the DMOZ data is blowing-up with an XML scheme failure around line 26946609 of this week's (2008/02/18) content.rdf file. It's to do with an external link to portaljove .com in Top/World/Español/Regional/Europa/España/Comunidades_Autónomas/Comunidad_Valenciana/Educación. The record seems to have two descriptions (one of with is missing it's closing tag) and a second title tag inside one of the descriptions. Anyway, it's a bit of a mess. Can an editor take a look at it? Cheers Jason <URL deleted>
Last reply by cmeerw, -
-
-
- Meta
- Editall
- Editall/Catmv
- 10 replies
- 7.8k views
Hi, I have opened a topic today for suggesting a partnership with your community. I have developped a search engine which is based on the ODP. I am crawling pages of all websites registered in your database, using the category structure too. I was thinking to extend my project to a community of crawlers because my bot can scan up to 150 000 per day from a low cost computer. But it seems that you have deleted my previous post. My first question is why ? maybe you do not want to have such discussion ? Best regards Luc Michalski
Last reply by lucasmd, -
-
-
- Meta
- 5 replies
- 4.8k views
phpOpen is multi-platform compatible. phpOpen is a free PHP script that grabs the contents of the Open Directory dynamically and formats them to make your own version of the Open Directory. This means that you can add an entire web directory to your own website. ________________
Last reply by informator , -
-
- 2 replies
- 4k views
Hi all, I'm new here and wanted to check if I could ask something. What are your thoughts? How can I connect or use the Open Directory RDF dump directly? or just use the search tool/web front end they provide? do I need to download and make it available on my own server to be able to use it? I want to try make my own search tool, independant of the big guns like google etc. and wonder if it's possible and how much time you all think it'll take. Any advise would be welcome. Thanks Kim
Last reply by samual, -
-
- Meta
- 1 reply
- 4k views
Why doesn't Google adhere to the use license and display where they got the data from on each results page like the license says. They should be band from using odp data.
Last reply by windharp , -