Guest Posted November 8, 2002 Posted November 8, 2002 I have a URL ( Alamax Consulting ) currently listed in a regional directory under dmoz.org, listed as Alamax Consulting Computer Services. When I type in a simple phrase like "alamax" into a Google or Alta-Vista search box, I don't come up. I thought that those search engines utilized ODP/DMOZ directories for their own listings. What is the nature of the realtionship between ODP and the popular search engines? Do I need to do something else to get those guys to pick up my listing?
uzs980 Posted November 8, 2002 Posted November 8, 2002 Your site has been listed in October. The data for external users like Google are a bit older (from the end of September) and could not be updated yet due to technical problems. But staff is working on that. So you - and we all - just have to be patient.
Guest Posted November 8, 2002 Posted November 8, 2002 Thank you. I presume that means that Google, AltaVista and others are all somewhat linked via the ODP.
joeblakesley Posted November 8, 2002 Posted November 8, 2002 No, there is no link as such. Anyone can use the ODP data under the terms of the free-use license at http://dmoz.org/license.html (e.g.: by utilising the RDF dump at http://dmoz.org/rdf.html ). A list of some of the data users is avaliable at http://dmoz.org/Computers/Internet/Searching/Directories/Open_Directory_Project/Sites_Using_ODP_Data/ . Joe
Meta hutcheson Posted November 8, 2002 Meta Posted November 8, 2002 Remember that a search engine doesn't have to ask permission either to spider dmoz.org, or to download and parse the RDF. We know what Google is doing, because they publicize it -- good publicity for them, they musta thought. We don't know what AltaVista is doing with ODP data -- if they are doing anything special with it, they must think keeping it secret is a competitive advantage. Either way, their choice. Only DIRECTORIES (e.g. directory.google.com) have to include ODP attribution.
Guest richard123 Posted November 12, 2002 Posted November 12, 2002 I think it may be a bit older than end of September (??) My site was published earlier this year on September 19, 2002. I am hoping for an early xmas present, but I'm not really that hopeful. If they can't fix it in 2 months, what's the chance they can do it in 3? Or 4 even.... Who knows? It may require a major hardware upgrade and that could easily take 6 months or more, I'd imagine. Still... I live in hope <img src="/images/icons/smile.gif" alt="" />
Guest richard123 Posted November 12, 2002 Posted November 12, 2002 I'd rather not say. It's adult oriented. But it appeared the very first time on September 19, 2002 and has been there every day since.
dfy Posted November 12, 2002 Posted November 12, 2002 >> I think it may be a bit older than end of September << You can check for yourself by visiting http://dmoz.org/rdf/?M=A and looking at the dates on the files.
Guest richard123 Posted November 13, 2002 Posted November 13, 2002 Thanks for the link! It looks like the last successful one happened on the one before Sep 22, because the the one on Sep 22 wasn't complete. Mayve that's when they discovered they had a problem. So... that would have been (??) September 17, 2002. That's the most recent "content.rdf.u8.gz". Another thing is that my site still doesn't show up in "search" after all this time. I suppose that's also on the "to do" list <img src="/images/icons/smile.gif" alt="" />
beebware Posted November 13, 2002 Posted November 13, 2002 Yep, the ODP search engine normally updates around 2 days after the RDF dump has been produced (I believe staff have got the search running off a copy of the dump to try and relieve a bit of pressure on the main server). However, this does mean that when the RDF dump is out of date, search is too. ODP staff members are more than aware of the issue and are working on resolving it as soon as possible (in fact, as I type, there is another attempt to produce the RDF dump going ahead - but it'll be a minimum of 24 hours before we know if the problem has been sorted).
Guest richard123 Posted November 15, 2002 Posted November 15, 2002 This is what I don't get, really... I think the ODP is a great resource, but it's 2 months out of date with no credible signs of being "fixed" anytime soon. Would it not be better to tell people how long before the update will happen? I have read in various places that the update is the "highest priority" and it really just gives credence to those critisizing the ODP for being slow to get things done. I mean: If the update, as a "high priority" takes over 2 months (and possibly 3 or 4??) then what hope have we got? I mean, really! (Of course I realise computers can be finicky things, but there are limits as to just how poor time estimates are allowed to be <img src="/images/icons/smile.gif" alt="" /> )
Meta windharp Posted November 15, 2002 Meta Posted November 15, 2002 Some facts about ODP you might not know: --> staff programming "team" is one person. --> RDF dump generation takes about a week if it is performing normally - sometimes even longer if it crashes. --> A task that takes that long clearly has to be optimized for speed. That means less debug information and so on. --> DMOZ link database contains almost every kind of foreign characters (you ever had to implement latin languages and japanese in one database?), lots of different encodings and almost any stupid stuff you could imagine. Combine all of these and you will realize that tracking down bugs in RDF generation is a very time consuming task, especially since the programmer has not done all the software herself, so has to gather knowledge first. We cant tell you how long it will take because we simply do not know when it will be fixed. Every software related project that grows rapidly - like the ODP - reaches a limit when they discover that the current software has bugs that show only under heavy load and/or under weired circumstances. Curlie Meta/kMeta Editor windharp
Guest richard123 Posted November 15, 2002 Posted November 15, 2002 Thanks for the very informative post. I knew some of the stuff, but not other things (especially about it taking a week to generate an RDF dump). My impression was that "running an update" took a couple of hours at most <img src="/images/icons/blush.gif" alt="" /> All the more reason for me to not hold my breath and keep wishing for an update before xmas!
stevesliva Posted November 15, 2002 Posted November 15, 2002 It's "only" been running for three or four days now. We've got our fingers crossed. Here is a example of the character set nightmare referred to above, although I think it's gotten a bit worse. I've read a lot about transitioning to UTF-8, whatever that is.
Meta windharp Posted November 15, 2002 Meta Posted November 15, 2002 ... and now to some very basic information about "Unicode UTF-8" which may sound more familiar to some :-) Unicode is a type of encoding that can handle all (uhhhh... Say at least most <img src="/images/icons/wink.gif" alt="" /> ) of those chaotic charsets used around the world - so it would make everything easier for communities like the ODP. If it wasnt a bit more complicate than those simple charsets everybody used yet. <img src="/images/icons/smile.gif" alt="" /> Some further readings: [*] Unicode Homepage [*]example how it might look like (if your browser supports UTF): http://www.unicode.org/iuc/iuc10/x-utf8.html [*] Some PDF about it on http://www.unicode.org/unicode/uni2book/u2.html [/list:u] Curlie Meta/kMeta Editor windharp
stevesliva Posted November 15, 2002 Posted November 15, 2002 Well, heck, if they meant Unicode, why didn't they say so? And why throw in -8 when the big deal about Unicode is the transition to 16-bit characters from 8?
Meta windharp Posted November 15, 2002 Meta Posted November 15, 2002 There is a so called "UTF-8" which I think makes Unicode somehow work on 8bit. More about this can be found in the links I mentioned above <img src="/images/icons/wink.gif" alt="" /> (And you could check the internal fora for some more information about the ODP and unicode if you like) Curlie Meta/kMeta Editor windharp
Guest Posted January 8, 2003 Posted January 8, 2003 There are UTF-8 for 8-Bit Character and UTF-16 for 16-Bit Character like those called "Doublebyte-Characters" from east-asia. <img src="/images/icons/smile.gif" alt="" />
Guest eqfan7v Posted February 1, 2003 Posted February 1, 2003 windharp, Very informative posts. But I still don´t understand: ONE week to update a database the size of Dmoz's? When I remember that google processes millions of daily searches, over a much bigger database, and returns *ranked* results in fractions of a second, I think I have reasons to still be surprised, don´t you agree? <img src="/images/icons/confused.gif" alt="" />
Bluesplinter Posted February 1, 2003 Posted February 1, 2003 But I still don´t understand Well, Google was funded by venture capital (I know of one round of $25 Million, I don't know if there were other rounds or not), plus they make oodles of money from search agreements they've signed, AdWords, etc. All that money allows them to buy a fair bit of processing horsepower. I know of 10 or 12 servers accessible to the public, and I don't doubt they have many times that for internal data massaging. ODP has some very nice hardware, but nothing like that.
totalxsive Posted February 1, 2003 Posted February 1, 2003 Indeed, the entire ODP runs on a mere 4 machines, if my memory serves my correctly. And at least 1 one of those was only added in the past 2 weeks or so.
brmehlman Posted February 1, 2003 Posted February 1, 2003 And Google doesn't produce their index in seconds. They produce it in at least days, perhaps weeks. We don't know and they aren't telling, but their spider crawls over my own web site about once a month. It's the searches within that index that the user sees, and those are indeed blazingly fast.
jtbell Posted February 1, 2003 Posted February 1, 2003 But I still don´t understand: ONE week to update a database the size of Dmoz's? When I remember that google processes millions of daily searches, over a much bigger database, and returns *ranked* results in fractions of a second, [...] Google takes a long time to generate a complete update, too. They update their index towards the end of the month, based on data gathered during the "deep crawl" at the beginning of the month. On my server, the "deep crawler" showed up during 3-12 January. The actual update began around 26 January, so it took about two weeks to generate the new index.
Meta mcoupal Posted February 1, 2003 Meta Posted February 1, 2003 This is a good read: http://www.internetwk.com/lead/lead060100.htm Curlie: Been trying to give up the editing addiction since day 1. :moz:
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now