Dmoz - MySQL dump

mhale

Member
Joined
Apr 9, 2004
Messages
4
.sql ODP Dumps available free soon!

Hello. I will have the SQL dumps ready by tomorrow. I will be trying to get some steady system going here for at least biweekly updates on the sql dumps.

But, in order to provide them to the public, I need someone to host them. At this time, I am not sure how big it is going to be. But it will be 2 zips. One for structure, one for content.

Anyone want to host it, or know someone that may? Thanks
 

reedfloren

Member
Joined
Apr 10, 2004
Messages
18
I can't wait to be able to get the data in .sql format thank you so much for doing this mhale. Out of curiosity how big are the files?
 

mhale

Member
Joined
Apr 9, 2004
Messages
4
Hey. I've got all the data imported, but I haven't exported it into .sql's just yet. I am waiting for someone to offer to host it. With all the data in mySQL, it's 1.7GB. Should be much smaller when I export it and zip it up.
 

reedfloren

Member
Joined
Apr 10, 2004
Messages
18
mhale said:
Hey. I've got all the data imported, but I haven't exported it into .sql's just yet. I am waiting for someone to offer to host it. With all the data in mySQL, it's 1.7GB. Should be much smaller when I export it and zip it up.

Hi mhale,

I wish DMOZ.org would create CD's with the data in .SQL format, It sure would be useful. In the mean time... would you consider making CD's of it if I covered CD and shipping expenses?

Thanks,

Reed
 

giz

Member
Joined
May 26, 2002
Messages
3,112
Over the last few months, all of the ODP data has been converted to UTF-8 encoding. There have been some errors in the data here and there. Be aware that there are still some several thousand (in 4 million) UTF-8 encoding errors in the last RDF produced (which are mostly now fixed in the ODP data itself).

Over the last few months these errors started out at a high number, and have gradually been reduced. There are several internal projects to find and correct them all. Hopefully, they will all be corrected soon, but maybe this will not finish quite in time for the next RDF dump though.

Just a note if you are having any problems with the data.
 

nakulgoyal

Member
Joined
Nov 8, 2003
Messages
26
Re:

Great work you have done for the data. After zipping, the data should be less then 100MB I am sure and if more then 20,000 people download it, should be around 2000Gigs of Bandwidth. Let's all try and find a sponsor/host.

I would love to receive the SQL's myself as well..!!
 

roror

Member
Joined
Jul 9, 2004
Messages
4
why not make a torrent out of it ? that way we can effectively reduce the load on the main server.
 

Callimachus

Member
Joined
Mar 15, 2004
Messages
704
I wish DMOZ.org would create CD's with the data in .SQL format, It sure would be useful. In the mean time... would you consider making CD's of it if I covered CD and shipping expenses?

That would be an awfully frequent update cycle. By the time you received them by mail they'de likely be out of date.
 

giz

Member
Joined
May 26, 2002
Messages
3,112
Using a copy of the directory from only a few weeks ago would put you in the top 10% of ODP data users. Some only come back for updates once or twice per year, and many are still using 2, 3 or 4 year old copies of the data.


The RDF problems are now fixed. All of the data in the last 6 weeks has had less than 10 UTF-8 errors in it per week (most weeks there were just 3 or 4 errors, one week with 1 error, and one week with 0 errors) as a rogue edit slipped through the net. Last week there was just one error again. A new RDF dropped today (status unknown at the moment).
 

cygnusx1

Member
Joined
May 5, 2004
Messages
6
Still looking for space for DMOZ as SQL files?

Hi "mhale",

You still pass by this forum? ...the trail is a little cold.

Stumbled across this thread and wondered if you managed to find somewhere to host your SQL files generated from the DMOZ RDF's?

If not, I have the space available on my hosting package, and I am willing to host them there.

I tried emailing you direct but your profile doesn't allow it -maybe a friendly administrator reading this in passing will put us in touch???
 

nadeem

Member
Joined
Jul 15, 2004
Messages
2
Which Database to use on backEnd?

to use ODP online, which database would someone suggest to use coz there are estimated 4 millions records there ,that are supposed to be increased day by day.so strong database must be there so that can handle searching mechanisim and so many users at the same time.
plateform is LInux with PHP and database ..?
 

motsa

Curlie Admin
Joined
Sep 18, 2002
Messages
13,294
I think it's safe to say that you should forget about this thread -- mhale has not visited or posted here in 3 years. Closing this thread so that people won't bump it any more.
 

weglobenet

Member
Joined
Oct 7, 2007
Messages
16

windharp

Meta/kMeta
Curlie Meta
Joined
Apr 30, 2002
Messages
9,204
I merged both threads together for ease of use.

Thanks for offering this conversion to our users! If you have the ability to check, it would be nice if you could report back in one or two months if and how many people downloaded the dump (of course only those that downloaded directly from your site).

One thing I noticed: You marked the dump with "september 2007", which is a little bit inaccurate. If our system is working as intended - which it currently is AFAIK - it generates new dumps weekly. Even if you don't want to follow that schedule (which would be pretty understandable), I think it would be best to use the complete date tage in the future, so that people know which RDF you have actually been using.
 

weglobenet

Member
Joined
Oct 7, 2007
Messages
16
I can give any name, but dmoz archive is updated monthly, i.e. I have no other way to identify the snapshot - the file names (content.rdf.u8.gz and structure.rdf.u8.gz) are not changed. The only change is the folder name in archive.
 

windharp

Meta/kMeta
Curlie Meta
Joined
Apr 30, 2002
Messages
9,204
Hmmm... That is interesting, you are right. I didn't realize that was changed, but it obviously was... *Back to the internal forums searching clarification*
 

brmehlman

Member
Joined
Nov 6, 2002
Messages
3,080
I know for sure that a new dump is generated every week, typically completed Tuesday evening US time, early Wednesday morning in Europe.

From the dates it looks like the first one generated in each calendar month is placed in the archive.

I'm nearly certain that the others are publicly available for the week following their generation but I can't say with absolute certainty because I use the internal copy.

To find out for sure, have a look at http://rdf.dmoz.org/rdf/ sometime Wednesday. By then, if I'm guessing right, it should be showing a date of 9 October for content.rdf.u8.gz.
 
This site has been archived and is no longer accepting new content.
Top