Jump to content

Recommended Posts

Posted

Hi All,

I need to download all ODP pages that contain links to english sites.

After reading about ODP, I thought that I just need to download all pages

except from Wold/* and Kids_And_Teens/International/* and then extract

links to all english sites from those pages.

 

However, I found many exceptions such as

 

http://www.dmoz.org/Business/Business_Services/Communications/Translation/Single_Language/Slovenian_and_English/

 

This path is neither in World directory nor in Kids_And_Teens/International but still

when I click on "Tomaž Metelko" on that page, I reach a non-english site.

 

Please let me know if I am doing something wrong.

Also, please let me know if you are aware of how to download ODP pages contaning

links only to english sites.

 

Thanks,

kiran

Posted

I've moved your thread to the correct forum. If you browse through its other threads, you'll probably find the answers you're looking for.

 

when I click on "Tomaž Metelko" on that page, I reach a non-english site.
Where a website has a prominent language selector, as in this case, we list the root URL. We reckon that most surfers would recognise a large UK flag with the words 'English version' underneath it :).
  • 9 months later...
Posted
Did you manage to find an answer kmehta? I've been through a dozen pages now in this forum area and can't see anything on extracting english languages sites (for me I just want everything excluding the dmoz.org/World/ category).

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...