jjohnstn Posted October 4, 2004 Posted October 4, 2004 I would like to download the RDF dump and generate static HTML pages (with customizable headers and footers). I have only found one program called iHierarchy that claims to do this ( http://simiax.com/ihier.html ) however it is $199 and a demo to test is not available. Are there any other applications that will do this? Also, does any know or care to guesstimate the size of the final static HTML output? I have a dedicated server with about 60 GB of free space so hopefully there would be room for the output. I would need the program to maintain the exact ODP directory structure: i.e. http://www.domain.com/ODP/Recreation/Outdoors/
bobrat Posted October 4, 2004 Posted October 4, 2004 Based on some work I'm currently doing on a subset - here is a rough idea. 45,000 categories needs about 70MB - I don't know what the actual full counts are - however the front page says 590,000 categories - so do the math. If you are displaying full sites descriptions, then you would need more than this, maybe 25% That will however depend exactly what you generate for the HTML and what you are displaying
Meta windharp Posted October 4, 2004 Meta Posted October 4, 2004 We collect those tools in http://dmoz.org/Computers/Internet/Searching/Directories/Open_Directory_Project/Use_of_ODP_Data/Upload_Tools/ - maybe you can find something there. Curlie Meta/kMeta Editor windharp
jjohnstn Posted October 4, 2004 Author Posted October 4, 2004 Based on some work I'm currently doing on a subset - here is a rough idea. 45,000 categories needs about 70MB - I don't know what the actual full counts are - however the front page says 590,000 categories - so do the math. If you are displaying full sites descriptions, then you would need more than this, maybe 25% That will however depend exactly what you generate for the HTML and what you are displaying So probably around 1 gig or so. Shouldn't be a problem, and thanks for the response.
jjohnstn Posted October 4, 2004 Author Posted October 4, 2004 We collect those tools in http://dmoz.org/Computers/Internet/Searching/Directories/Open_Directory_Project/Use_of_ODP_Data/Upload_Tools/[/url"] - maybe you can find something there. Thanks.. I did check there but all I saw (except for iHierarchy mentioned in my original post) was for importing into MySQL or screen scrapers. Still looking....
dizzlewizzle Posted October 7, 2004 Posted October 7, 2004 Go here, http://www.cgiexpo.com/Scripts/36_CGI_Scripts_Search_Portals/20/ There is open source scripts here that do the exact same thing. The script you were looking at only parses the data, you dont actually host the data itself. You dont need to pay $200.00 bucks for a script. Open Source is always the way to go if you can. Besides If Im not mistaken just by looking at his small demo code its based off of some of the open source scripts found at the url above. Good luck!
jjohnstn Posted October 7, 2004 Author Posted October 7, 2004 Go here, http://www.cgiexpo.com/Scripts/36_CGI_Scripts_Search_Portals/20/ There is open source scripts here that do the exact same thing. The script you were looking at only parses the data, you dont actually host the data itself. You dont need to pay $200.00 bucks for a script. Open Source is always the way to go if you can. Besides If Im not mistaken just by looking at his small demo code its based off of some of the open source scripts found at the url above. Good luck! Thanks, but the ones I saw there (except iHierarchy) were screen scrapers or did not use ODP data. The ODP keeps going down due to server issues, so I need to have a complete local copy of the ODP... hence the need for a local copy from the RDF dump.
dizzlewizzle Posted October 8, 2004 Posted October 8, 2004 Hi there again, If you need to have the actuall content on your server and are having probs figuring it out, let me know and I will help you do it free. I am currently working on the nutch.org code and revamping etc, and have been able to not only parse, but create sql databases from it. Once the actuall parse, and sql database is formed the rest of the script is very easy, we can use the free version of the script you were looking at and simply add a admin interface, along with sql support so that all search requests come from your server and database-not the odp. Let me know if you would like my help, I could do it in about a day....Free. Let me know buddy.
jjohnstn Posted October 10, 2004 Author Posted October 10, 2004 DizzleWizzle, Many thanks for the offer but can it be done without using mysql? I would like to use completely static HTML pages on my server (no db calls) to keep the server load down. I've looked at writing a Perl script to parse out the RDF dump and create the HTML pages but didn't want to "reinvent the wheel". Again, thank you for your generous offer!
bobrat Posted October 10, 2004 Posted October 10, 2004 I have done this one with completely static pages [as an experiment in progress] Cool Sites Project However, there are many issues that you have to consider when doing it this way as opposed to creating dynamically off SQl tables. E.g. When the RDF data updates, unless you keep carefull track, you would have to regenerate all pages every week.
Dardo Posted January 5, 2005 Posted January 5, 2005 I'm also interested in this topic, why the discussion stopped here? What is the answer?
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now