Volitics Posted August 31, 2004 Posted August 31, 2004 Hello; I'm thinking about designing a search engine. I somehow need to get a list of domain names or URLs for the search engine to crawl and add to the index. Does anyone know how I can get a listing of just the raw URLs in the dmoz database? Thanks.
RZ Admin photofox Posted August 31, 2004 RZ Admin Posted August 31, 2004 Hi there, There is actually a specific forum for questions like this, it can be found here. You might want to take a look at some of the past threads there to see if your question has already been answered... You may also want to take a look at http://rdf.dmoz.org/ These are the rules you must follow http://dmoz.org/license.html Again, if you search the forums, you should be able to find much of the information you need in order to set things up. Curlie Admin photofox
bobrat Posted August 31, 2004 Posted August 31, 2004 The above is correct, however, there is currently a problem accessing http://dmoz.org/license.html and some of the other related information. You can download a content RDF and extract URL's from that, note that you would have to write code to elimate duplicate entries, and not all URL's are the "home" pages of sites.
motsa Posted August 31, 2004 Posted August 31, 2004 http://dmoz.org/license.html is working fine now.
shilpesh Posted October 12, 2004 Posted October 12, 2004 hello volitics i read your message you can get the help from the dmoz site itself as it allows to download the whole database which is in .gz form you have to download it and then convert into mysql database by using a third party software and then you can get the raw listings of the urls of dmoz database if you get my message then reply to me as i can give u further information regarding how the urls can be extracted regards, shilpesh
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now