Jump to content

Recommended Posts

Posted

.sql ODP Dumps available free soon!

 

Hello. I will have the SQL dumps ready by tomorrow. I will be trying to get some steady system going here for at least biweekly updates on the sql dumps.

 

But, in order to provide them to the public, I need someone to host them. At this time, I am not sure how big it is going to be. But it will be 2 zips. One for structure, one for content.

 

Anyone want to host it, or know someone that may? Thanks

Posted
Hey. I've got all the data imported, but I haven't exported it into .sql's just yet. I am waiting for someone to offer to host it. With all the data in mySQL, it's 1.7GB. Should be much smaller when I export it and zip it up.
Posted
Hey. I've got all the data imported, but I haven't exported it into .sql's just yet. I am waiting for someone to offer to host it. With all the data in mySQL, it's 1.7GB. Should be much smaller when I export it and zip it up.

 

Hi mhale,

 

I wish DMOZ.org would create CD's with the data in .SQL format, It sure would be useful. In the mean time... would you consider making CD's of it if I covered CD and shipping expenses?

 

Thanks,

 

Reed

Posted

Over the last few months, all of the ODP data has been converted to UTF-8 encoding. There have been some errors in the data here and there. Be aware that there are still some several thousand (in 4 million) UTF-8 encoding errors in the last RDF produced (which are mostly now fixed in the ODP data itself).

 

Over the last few months these errors started out at a high number, and have gradually been reduced. There are several internal projects to find and correct them all. Hopefully, they will all be corrected soon, but maybe this will not finish quite in time for the next RDF dump though.

 

Just a note if you are having any problems with the data.

Posted

Re:

 

Great work you have done for the data. After zipping, the data should be less then 100MB I am sure and if more then 20,000 people download it, should be around 2000Gigs of Bandwidth. Let's all try and find a sponsor/host.

 

I would love to receive the SQL's myself as well..!!

  • 2 months later...
  • Editall
Posted
I wish DMOZ.org would create CD's with the data in .SQL format, It sure would be useful. In the mean time... would you consider making CD's of it if I covered CD and shipping expenses?

 

That would be an awfully frequent update cycle. By the time you received them by mail they'de likely be out of date.

ODP Editor callimachus

Any opinions expressed are my own, and do not represent an official opinion or communication from the ODP.

Private messages asking for submission status or preferential treatment will be ignored.

Posted

Using a copy of the directory from only a few weeks ago would put you in the top 10% of ODP data users. Some only come back for updates once or twice per year, and many are still using 2, 3 or 4 year old copies of the data.

 

 

The RDF problems are now fixed. All of the data in the last 6 weeks has had less than 10 UTF-8 errors in it per week (most weeks there were just 3 or 4 errors, one week with 1 error, and one week with 0 errors) as a rogue edit slipped through the net. Last week there was just one error again. A new RDF dropped today (status unknown at the moment).

Posted

Still looking for space for DMOZ as SQL files?

 

Hi "mhale",

 

You still pass by this forum? ...the trail is a little cold.

 

Stumbled across this thread and wondered if you managed to find somewhere to host your SQL files generated from the DMOZ RDF's?

 

If not, I have the space available on my hosting package, and I am willing to host them there.

 

I tried emailing you direct but your profile doesn't allow it -maybe a friendly administrator reading this in passing will put us in touch???

Posted

Which Database to use on backEnd?

 

to use ODP online, which database would someone suggest to use coz there are estimated 4 millions records there ,that are supposed to be increased day by day.so strong database must be there so that can handle searching mechanisim and so many users at the same time.

plateform is LInux with PHP and database ..?

  • 2 years later...
  • 5 months later...
Posted
I think it's safe to say that you should forget about this thread -- mhale has not visited or posted here in 3 years. Closing this thread so that people won't bump it any more.
  • 2 months later...
Posted

This thread continues the closed one:

http://www.resource-zone.com/forum/index.php?showtopic=13283

 

The loadable dump available

at

 

http://www.we-globe.net/WebLab/Download/DmozRdf2MySQL.html

 

But I'm not sure that bandwidth is enough.

I've also put it on eMule network.

Please inform me if it accessible via eMule

and if MD5 has is valid after download.

 

If the trick will succeed and somebody will find it useful

I can update it monthly

Thanks

  • Meta
Posted

I merged both threads together for ease of use.

 

Thanks for offering this conversion to our users! If you have the ability to check, it would be nice if you could report back in one or two months if and how many people downloaded the dump (of course only those that downloaded directly from your site).

 

One thing I noticed: You marked the dump with "september 2007", which is a little bit inaccurate. If our system is working as intended - which it currently is AFAIK - it generates new dumps weekly. Even if you don't want to follow that schedule (which would be pretty understandable), I think it would be best to use the complete date tage in the future, so that people know which RDF you have actually been using.

Curlie Meta/kMeta Editor windharp

 

d9aaee9797988d021d7c863cef1d0327.gif

Posted
I can give any name, but dmoz archive is updated monthly, i.e. I have no other way to identify the snapshot - the file names (content.rdf.u8.gz and structure.rdf.u8.gz) are not changed. The only change is the folder name in archive.
  • Meta
Posted
Hmmm... That is interesting, you are right. I didn't realize that was changed, but it obviously was... *Back to the internal forums searching clarification*

Curlie Meta/kMeta Editor windharp

 

d9aaee9797988d021d7c863cef1d0327.gif

Posted

I know for sure that a new dump is generated every week, typically completed Tuesday evening US time, early Wednesday morning in Europe.

 

From the dates it looks like the first one generated in each calendar month is placed in the archive.

 

I'm nearly certain that the others are publicly available for the week following their generation but I can't say with absolute certainty because I use the internal copy.

 

To find out for sure, have a look at http://rdf.dmoz.org/rdf/ sometime Wednesday. By then, if I'm guessing right, it should be showing a date of 9 October for content.rdf.u8.gz.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...