kmehta Posted March 28, 2009 Posted March 28, 2009 Hi All, I need to download all ODP pages that contain links to english sites. After reading about ODP, I thought that I just need to download all pages except from Wold/* and Kids_And_Teens/International/* and then extract links to all english sites from those pages. However, I found many exceptions such as http://www.dmoz.org/Business/Business_Services/Communications/Translation/Single_Language/Slovenian_and_English/ This path is neither in World directory nor in Kids_And_Teens/International but still when I click on "Tomaž Metelko" on that page, I reach a non-english site. Please let me know if I am doing something wrong. Also, please let me know if you are aware of how to download ODP pages contaning links only to english sites. Thanks, kiran
jimnoble Posted March 29, 2009 Posted March 29, 2009 I've moved your thread to the correct forum. If you browse through its other threads, you'll probably find the answers you're looking for. when I click on "Tomaž Metelko" on that page, I reach a non-english site.Where a website has a prominent language selector, as in this case, we list the root URL. We reckon that most surfers would recognise a large UK flag with the words 'English version' underneath it .
chardman Posted January 22, 2010 Posted January 22, 2010 Did you manage to find an answer kmehta? I've been through a dozen pages now in this forum area and can't see anything on extracting english languages sites (for me I just want everything excluding the dmoz.org/World/ category).
Meta informator Posted January 22, 2010 Meta Posted January 22, 2010 ODP has previously not provided separete RDF´s for different category trees, but that may change with dmoz 2.0... Curlie (Dmoz) Meta editor informator
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now