Downloading pages contaning links to english pages

kmehta

Member
Joined
Mar 28, 2009
Messages
2
Hi All,
I need to download all ODP pages that contain links to english sites.
After reading about ODP, I thought that I just need to download all pages
except from Wold/* and Kids_And_Teens/International/* and then extract
links to all english sites from those pages.

However, I found many exceptions such as

http://www.dmoz.org/Business/Busine...lation/Single_Language/Slovenian_and_English/

This path is neither in World directory nor in Kids_And_Teens/International but still
when I click on "Tomaž Metelko" on that page, I reach a non-english site.

Please let me know if I am doing something wrong.
Also, please let me know if you are aware of how to download ODP pages contaning
links only to english sites.

Thanks,
kiran
 

jimnoble

DMOZ Meta
Joined
Mar 26, 2002
Messages
18,915
Location
Southern England
I've moved your thread to the correct forum. If you browse through its other threads, you'll probably find the answers you're looking for.

when I click on "Tomaž Metelko" on that page, I reach a non-english site.
Where a website has a prominent language selector, as in this case, we list the root URL. We reckon that most surfers would recognise a large UK flag with the words 'English version' underneath it :).
 

chardman

Member
Joined
Jan 13, 2010
Messages
6
Did you manage to find an answer kmehta? I've been through a dozen pages now in this forum area and can't see anything on extracting english languages sites (for me I just want everything excluding the dmoz.org/World/ category).
 

informator

kEditall/kCatmv
Curlie Meta
Joined
Aug 19, 2003
Messages
1,697
Location
Sweden
ODP has previously not provided separete RDF´s for different category trees, but that may change with dmoz 2.0...
 
This site has been archived and is no longer accepting new content.
Top