Jump to content

Recommended Posts

Posted

Hello;

 

I'm thinking about designing a search engine. I somehow need to get a list of domain names or URLs for the search engine to crawl and add to the index.

 

Does anyone know how I can get a listing of just the raw URLs in the dmoz database?

 

Thanks.

  • RZ Admin
Posted

Hi there,

 

There is actually a specific forum for questions like this, it can be found here. You might want to take a look at some of the past threads there to see if your question has already been answered...

 

You may also want to take a look at http://rdf.dmoz.org/

These are the rules you must follow http://dmoz.org/license.html

 

Again, if you search the forums, you should be able to find much of the information you need in order to set things up.

Curlie Admin photofox
Posted

The above is correct, however, there is currently a problem accessing http://dmoz.org/license.html and some of the other related information.

 

You can download a content RDF and extract URL's from that, note that you would have to write code to elimate duplicate entries, and not all URL's are the "home" pages of sites.

  • 1 month later...
Posted

hello volitics i read your message

you can get the help from the dmoz site itself as it allows to download the whole database which is in .gz form you have to download it and then convert into mysql database by using a third party software

and then you can get the raw listings of the urls of dmoz database

 

if you get my message then reply to me as i can give u further information regarding how the urls can be extracted

 

regards,

shilpesh

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...