Jump to content

Recommended Posts

Posted

Hello,

 

I had a look at the About page and I was wondering how many categories and subcategories there are on DMOZ...I didn't find the information in sticky topics here either. I am currently running a script that retrieves categories from the RDF file and it currently extracted more than 30,000 categories...It's crazy :eek:

How many are there?

Thanks,

  • Meta
Posted

The precise figure varies, as editors are regularly creating/merging/deleting categories. As an approximation though, the front page (dmoz.org) states at the bottom of the page "4,581,289 sites ... over 590,000 categories"

 

Regards;

 

aeclark

  • Meta
Posted

Nope, it should take you less than an hour.

 

Stats from running the insert script included with phpODPWorld on an old machine with 256 MB RAM (!) and an AMD Athlon XP 1.7GHz CPU:

 

======================================================================
======================================================================
Inserting RDF into MySQL using phpODPWorld
======================================================================
======================================================================

CONTENT:
# time ./phpodpworld.pl content2db config-mysql-test.pl content.rdf.u8
Info: Loading content records
Info: Record 1000
Info: Record 2000
[...]
Info: Record 4825000
Info: Record 4826000
Info: 4826113 loaded

real    42m7.930s
user    34m26.488s
sys     0m33.846s

======================================================================

STRUCTURE:
# time ./phpodpworld.pl structure2db config-mysql-test.pl structure.rdf.u8
Info: Loading structure records
Info: Record 1000
Info: Record 2000
[...]
Info: Record 724000
Info: 724598 loaded

real    13m25.975s
user    10m7.791s
sys     0m16.187s

======================================================================
======================================================================
Inserting RDF into PostgreSQL using phpODPWorld
======================================================================
======================================================================

CONTENT:
# time ./phpodpworld.pl content2db config-mysql-test.pl content.rdf.u8
Info: Loading content records
Info: Record 1000
Info: Record 2000
[...]
Info: Record 4825000
Info: Record 4826000
Info: 4826113 loaded

real    42m28.023s
user    35m8.967s
sys     0m36.682s

======================================================================

STRUCTURE:
Info: Loading structure records
Info: Record 1000
Info: Record 2000
[...]
Info: Record 724000
Info: 724598 loaded

real    16m58.283s
user    10m15.220s
sys     0m19.324s

 

PS! I'm planning to make a release of phpODPWorld the coming week which fixes some minor quirks.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...