Jump to content

Recommended Posts

Posted

I was just wondering whether the staff are aware of the misclaimed sizes of the RDF dump. For example, the main content at the last update was listed as 179Mb. After downloading it, and waiting a very, very long time, it turns out that the file is in fact 931Mb. Quite a significant difference.

 

In the past I have killed the RDF download under the assumption that the connection was simply broken (it was reporting 400Mb downloaded, even though only 160Mb were supposed to be there). That is when the connections didn't time out themselves, which seems to happen fairly often (I have a cable connection).

 

As an aside, the first time that I successfully downloaded the RDF dump, I ended up with the version containing non-unique category ID numbers. Doh.

 

Anyway, the point is that over the last three months I have found the entire experience of trying to use the RDF dumps fairly tedious and painful. I presume that others suffer from the same problems that I did.

 

If you want to increase the use of the ODP data, then you might want to look at the RDF page. Just a thought.

Posted

The stated size of the download is accurate. It's in compressed, gzip format. When de-compressed, it is of course, much larger.

 

If you download with a standard browser, the browser will decompress on the fly. Use something like wget http://wget.sunsite.dk/ instead. It's a command line program that is much faster than any GUI browser.

Posted

ok, but at the end,

the file gonna be 1 Go (or near)...

not so easy to work on. What ever I do with it, my computer let me wait for a moment...

(I didn't receive my cray yet)

/images/icons/smile.gif

So, probably that's why so many solutions to work with ODP data are based on the fact that they catch data from web site (surfing), and don't care abtout the dump.

 

Pol

Posted

With all due respect, for many people downloading through their browsers, the figure is not "accurate". It may be a precise size of the file server side, but most people will not know that it is decompressed on the fly. I have been a fairly hardcore web professional for three years, and I have never come across this before.

 

I suggest that this should at least be addressed on the page, making it clear how large the file is if downloaded through a browser. Otherwise people are going to keep on having the same problems that I did; consistent time-outs and confusion about the file size.

 

You could also consider adding a normal zipped version of the file that can be downloaded through a browser. Surely this is easier than expecting every visitor to go and get another piece of software just to be able to download the data?

Guest Ciaran
Posted

Abrexa, you probably don't need another piece of software. Although gzipped files (.GZ) are most common among UNIX and its variants, WinZip can handle them just fine. /images/icons/smile.gif

 

The RDF page ( http://dmoz.org/rdf.html ) does actually warn about this problem:

 

>> Note that these fiels[sic] can be quite large. Your browser may have difficulty downloading these: it may try to uncompress it for you; it may try to interpret it for you. You may be reasonably confident that the problem is not on this end. <<

 

Add to this the fact that it generally is *not* a good idea to download huge files like these through a browser (not least because most browsers don't support resuming downloads), and my advice would be to get a special-purpose program meant for downloading files - for example, wget on Linux or GetRight on Windows.

  • Meta
Posted
I tend to download the RDfs with Opera. Quite often, Opera will even resume if interrupted. I have a DSL connection so it doesn't take too long.
  • 1 year later...
Guest ChrisMurray
Posted

Hey mods. I think this one could be a sticky as it is a problem quite a few people come up against.

 

Wget is a simple and effective solution and by putting this as a sticky will stop putting people off downloading and using ODP data.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...