Guest swisstony Posted July 4, 2003 Posted July 4, 2003 I am putting together an ODP "mirror" sites at the moment, using an XP machine to do all of the crunch work and then upload the entire directory to the *nix server. It is English language only. However, I came accross 3 different types of category that are simply incompatible with the Windows XP file system (and I presume other versions of Windows). If there are any techies here who are interested... 1. There are two categories that contain '...' in the title. '...' cannot be used in a Windows folder name; it is simply ignored, thus actually creating a misnamed folder! 2. A bunch of educational categories have such ridiculously long category names that they breach the 255 character limit for a folder tree. 3. The Internet/Content_Filtering category contains a category called 'pro' and a cat called 'con'. Windows does not allow folders called 'con'. I can come up with work arounds for each of these of course, but if you are looking to set up some helpful guidelines for future reference, then these may be points to keep in mind.
Alucard Posted July 5, 2003 Posted July 5, 2003 Interesting stuff. Just shows what you get if you use an operating system which is so limited ;-) Seriously, though - I would question the design of relying on the naming of folders to mirror the content of the ODP - every operating system has it's naming conventions and restrictions, and ODP category names don't follow any of them. I think with that design you're always going to run into issues.
bobrat Posted July 6, 2003 Posted July 6, 2003 Although you say, you are limiting to English, ODP in fact contains World non-English categories with sets of characters that would give Windows XP a stomach ache if you tried to name folders the same way. I'm afraid you have a flawed design. Folder names should never reflect data, no matter where the data came from ODP or not.
Guest swisstony Posted July 6, 2003 Posted July 6, 2003 Well, yes, the non-English categories would be an absolute nightmare to try and deal with, so I haven't bothered for the moment. All the non-English categories are grouped into three main groups, which is very useful: World Adult/World Kids and Teens/International Unfortunately, in my experience, when building a directory, folder names HAVE to reflect the data. Unless I want the server to do a file lookup and redirect for every request, then I have to actually use the ODP file structure, just as I would think the ODP does. If you know of another way, do let me know; I would be genuinely interested. I have never seen the server side implementation of the ODP, but then Unix doesn't have the same file restrictions that Windows does, so I doubt that any of these issues arises there. As I said, there are in fact only three types of category within the English language section that cause any issues at all. Of those, the only annoying one is the use of filenames over 255 characters, which could be easily avoided. Anyway, the idea wasn't to cause any trouble, merely to state the compatibility issues. The reason that I am doing the processing on a different box is to reduce the strain on the server. The reason that the box is running XP rather than Linux is that I can't stand Linux - I have tried installing it, but lack of familiarity and time merely made it a painful and very frustrating process. "Better the devil you know" and all that.
dfy Posted July 6, 2003 Posted July 6, 2003 Hmmmm. Would you walk a mile to the bus stop to catch a bus, and then a mile from the bus terminal to your place of work every day, on the grounds that learning to drive a car was too difficult and the bus was 'the devil you knew'? I know I wouldn't. While learning linux isn't easy, it would certainly be worth all the effort involved, because it would remove the problem you are currently looking at. It would probably remove a whole bunch of other problems too. Try one of the easier linux versions, like SuSe, which can be set-up to look almost exactly like Windows, but without the limitations. As Swiss Tony says: "You know, learning linux is like making love to a beautiful woman. First you have to learn which buttons to press, then you have to learn which order to press them in, then finally, all the hard work pays off when you make her purr like a kitten and store your files just where you want them."
Alucard Posted July 6, 2003 Posted July 6, 2003 In my opinion, the only real way to store the data like the ODP is in a database. ODP uses some sort of custom-built flat files, as far as I know, but if they were doing it all over again, I'm sure they would use some sort of relational db. That way you stand the best chance of making it not specific to the quirks of any particular operating system, and it makes it much more scalable - foreign characters, lengths of category names, easier searches, etc. If you are really stuck on doing it with file folders, then you should be developing on your target operating system, which I think you said was UN*X - then you need to look at the ODP naming conventions for categories and deal with how you map those into directory names which are compantible with the OS. ...and you're not causing trouble - we've said many times, what users do with the ODP data is their business
Guest swisstony Posted July 7, 2003 Posted July 7, 2003 It was in fact SuSE that I tried for a couple of months on a second box... but I just couldn't get anywhere with it. I not yet convinced that the time I would have to spend learning a new OS is good time management. Yes it will be useful, but in the short term that time is much better spent dealing with the few exceptions. I have of course used a database for the basis of the directory - that is how the pages are created in the first place. However, it would not be feasible to create all the directory pages on the fly from the database. Every Google crawl would create over a million calls to the database... and it would significantly slow down the page display for users. I prefer to take the load off the database so that it can be used solely for searching. By having the static HTML pages, it dramatically reduces the load on the server. If the ODP switched to a completely DB run basis, it would die almost instantly. I imagine that the static files are the only reason it runs at all at the moment.
sfromis Posted July 22, 2003 Posted July 22, 2003 Well, you do not *HAVE* to make the directory names match the category path - you could work around the Windows limitation by assigning shorter names, and also use them in the links. One possible way of deriving shorter names would be to use the catid instead of the category name. ODP itself handles World/ language category names by URLencoding them. Of course, this makes the names much longer, which would be probematic to shoehorn into the brief Windows filenames. This can also be avoided by using the catid.
Guest schoik Posted August 14, 2003 Posted August 14, 2003 Brawrrr, I suggest using " Mandrake Linux 9.1 " if your new to linux, that or Redhat 9.0B... Don't use SuSE unless you have somewhat idea of linux.
totalxsive Posted August 15, 2003 Posted August 15, 2003 Or wait a week or two and get Mandrake Linux 9.2 .
giz Posted August 15, 2003 Posted August 15, 2003 Linux versions, just like buses, none for ages then three almost all at once. Tsk.
Meta hutcheson Posted August 16, 2003 Meta Posted August 16, 2003 Ya, if you want guaranteed daily updates for your OS, you just gotta go with the Beast from Redmond and its critical security patch program.
TheAbsorbant Posted January 16, 2005 Posted January 16, 2005 Help!!! You guys seem to know what you are talking about, I found this thread while googling after a sollution for my problem, which is this: I have a Creative mp3 player. It utilizes the Creative Media Source Organizer. I tried to transfer the Megadeth album "Killing is My Business..." from the player to my hard drive, not realizing it would use invalid characters when creating the folder name (who would expect such a thing, most win apps substitute the .'s with _'s or alike). Now I'm stuck with a malfunctioning folder named "Killing Is My Business..", which won't let me open it nor delete it!! Does anyone know how to repair/delete that damn thing??!! I'm on WinXP, if you hadn't already figured out by the desperation...
Meta pvgool Posted January 16, 2005 Meta Posted January 16, 2005 Sorry but this forum is only for questions related to ODP aka DMOZ I will not answer PM or emails send to me. If you have anything to ask please use the forum.
TheAbsorbant Posted January 16, 2005 Posted January 16, 2005 Sorry but this forum is only for questions related to ODP aka DMOZThe helpful and humane thing to do would be to help out anyway, rather than coldly come with a "You're off topic. Go away!" kind of remark.
motsa Posted January 16, 2005 Posted January 16, 2005 The helpful and humane thing to do would be to help out anyway, rather than coldly come with a "You're off topic. Go away!" kind of remark.Except that this forum is not here to be helpful and humane about anything but the Open Directory Project. You need to sign up somewhere else to get help with your question.
Recommended Posts