lucasmd Posted November 26, 2008 Posted November 26, 2008 Hi, I have opened a topic today for suggesting a partnership with your community. I have developped a search engine which is based on the ODP. I am crawling pages of all websites registered in your database, using the category structure too. I was thinking to extend my project to a community of crawlers because my bot can scan up to 150 000 per day from a low cost computer. But it seems that you have deleted my previous post. My first question is why ? maybe you do not want to have such discussion ? Best regards Luc Michalski
Editall/Catmv makrhod Posted November 26, 2008 Editall/Catmv Posted November 26, 2008 it seems that you have deleted my previous post. My first question is why ?It was not deleted. The forum software automatically flagged it as needing moderation because of the number of URLs you included, which is against this forum's TOS. It is still awaiting moderation. FAQ about becoming a volunteer ODP editor. I edit for the ODP and support those guidelines at all times, but my opinions are my own.
lucasmd Posted November 26, 2008 Author Posted November 26, 2008 thank you for your reply !!! I was surprised... Hope you understand me ! :o
Editall Callimachus Posted December 6, 2008 Editall Posted December 6, 2008 Hopefully your enthusiastic crawler obeys the robots.txt and related http meta protocols. ODP Editor callimachus Any opinions expressed are my own, and do not represent an official opinion or communication from the ODP. Private messages asking for submission status or preferential treatment will be ignored.
lucasmd Posted December 9, 2008 Author Posted December 9, 2008 Hi, Yes of course, I am analyzing robots.txt and metas informations (NOINDEX,NOFOLLOW,...) Currently, I am working on a function for grabbing sitemap.xml files. I do not know how should I take contact with ODP staff for suggesting a partnership. Is there an interested administrator ? :-) Best Regards Luc Michalski
chaos127 Posted December 9, 2008 Posted December 9, 2008 (edited) From what you've posted, I'm not sure if you're planning to get a list of the ODP-listed sites by crawling the pages on Curlie But in case you are, https://curlie.org/robots.txt states # Please do not crawl us faster than 1 hit/second. # # If you need to examine many dmoz pages, please download the rdf file from # http://rdf.dmoz.org/ instead of crawling us. # User-agent: * Crawl-Delay: 1 Disallow: /cgi-bin/ Disallow: /editors/ Disallow: /World/.m Failure to follow those instructions if/when you access Curlie may result in your IP being banned... Edited May 13, 2018 by Elper
chaos127 Posted December 9, 2008 Posted December 9, 2008 I'm also not sure exactly what you're proposing and how it would benefit anyone in the "community" that may feel like helping you, and specifically how it might benefit the Open Directory Project. If you're interested in some sort of partnership with the ODP, then you'll presumably need to be offering something in return. To be honest though, most outsiders' views of what the problems are facing the project are mostly quite inaccurate. You'd get a much better view of things if you tried your hand at editing for a few months...
lucasmd Posted December 9, 2008 Author Posted December 9, 2008 Hi, Before trying to explain our potential partnership, you need to now that : - The first idea of this search engine is/was an educational purpose. - How works a search engine having an high volum of indexed pages ? - How to create ranking criterias for optimizing searching and crawling (Server Rank, Link rank, Site Rank) ? - How to crawl the web efficiently and quickly with a low cost computer ? - How to keep the best performances with a MySQL and PHP system ? - Understand technical limits of MySQL databases and find a flexible database layout - How to reduce/control energy costs of a potential farm of servers ? - How to convert the DMOZ's content file (RDF) into a SQL DUMP ? So the target is not to be the best but try to do the same for understanding the how to. Well, I have no diploma <url removed>. After finishing the first beta version of my search engine, I though to link to E-Commerce Portal with the Search Engine in order to provide more information to users (Web results/Products). I started to develop the DealGates Network 4 years ago ... About our potential partneship : Why ? I used your most of the category structure and I have converted your list of websites into a MySQL Directory... This technology can be an extension of your directory... Of course, you have Google but the idea is to try because it is just costing time and to find some passionnated webmasters. I can create an encrypted bot and share it with a community of webmasters using any LAMP server. A list of 500 or 1000 websites to crawl will be send to each bot, and a system will send the DUMP files to a special FTP account. If we established a community of 10 000 webmasters, we can crawl a billion of pages very quickly. Maybe you can ask me some precise questions if it will help you to understand better the project ? :o Ps. The main server is a little bit overloaded and I am doing my best for correcting it... It is a personal project Best regards Luc Michalski
lucasmd Posted December 15, 2008 Author Posted December 15, 2008 Hi, I have added some pages for explaining this project, you will see these links in the footer of each pages of the search engine. Best Regards Luc Michalski
Meta pvgool Posted December 15, 2008 Meta Posted December 15, 2008 Hi, Before trying to explain our potential partnership, you need to now that : Most of the editors do not have knowlegde about any of the points you mentioned. Technical development is done by our owners, AOL. You will have to contact them. About our potential partneship : Why ? I used your most of the category structure and I have converted your list of websites into a MySQL Directory... Many people have done this. That is why the DMOZ data is available for anybody to use. The only thing DMOZ asks is to honor our ownership. See http://www.dmoz.org/license.html Editors will not be able to discus this license. We not only lack the knowlegde but we are also prohibited by AOL legal department. I will not answer PM or emails send to me. If you have anything to ask please use the forum.
lucasmd Posted December 15, 2008 Author Posted December 15, 2008 Hi, I like such answers because it is efficient and well oriented... :-) Thanks, I saw that somebody from AOL France checked my profile yesterday at viadeo.com so maybe I can ask him some questions... Best regards Luc Michalski
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now