l008com2 Posted June 21, 2007 Posted June 21, 2007 I hope that specifics like this are ok for this forum. I made a relatively simple PHP script for parsing XML files. I'm trying to parse the ChefMoz data into a mysql database. But alas, it doesn't work. Not matter how many ways I try to clean the data before passing it to the xml_parser, I still get invalid character and other errors, well before I make it more than 10% of the way through the dump. I've tried everything I could think of, and everything I've been able to find on the web, with really very little success. I'm hoping someone in here can help me get past this hurdle. I can post my script if you want to see it?
Meta informator Posted June 21, 2007 Meta Posted June 21, 2007 I´m afraid that we´re not good at answering specific technical questions regarding the parsing of RDF-files. The files are offered, but we don´t have any tech support for those. From my understanding, it´s not a trivial everyday exercise to try to use the files through php and mysql. Curlie (Dmoz) Meta editor informator
Meta windharp Posted June 21, 2007 Meta Posted June 21, 2007 Are you using XML specific routines? The system Chefmoz uses is older than the RDF standard and the XML standards, and the character encoding used has some errors (which is a known bug for several years) I am usually parsing Chefmoz RDFs manually, without using any ready made components, and that is working pretty well. Unfortunately I have no clue about PHP, so I am sorry I won't be of any help. Since the structure of the RDF file is - at least as far as I needed it - without errors, I assume your main problem lies in badly encoded data being rejected by MySQL. That should be fixable with an intelligent routine, that makes sure every line you read is properly encoded before handing it to MySQL, either removing or encoding everything MySQL would reject. We do have a set of Chefmoz related things in http://dmoz.org/Computers/Internet/Searching/Directories/Open_Directory_Project/Tools_for_Editors/ChefMoz_Editors/ - maybe there even is one or another piece of software linked that you can use, I didn't check. Curlie Meta/kMeta Editor windharp
l008com2 Posted June 21, 2007 Author Posted June 21, 2007 Are you using XML specific routines? The system Chefmoz uses is older than the RDF standard and the XML standards, and the character encoding used has some errors (which is a known bug for several years) I am usually parsing Chefmoz RDFs manually, without using any ready made components, and that is working pretty well. Unfortunately I have no clue about PHP, so I am sorry I won't be of any help. Since the structure of the RDF file is - at least as far as I needed it - without errors, I assume your main problem lies in badly encoded data being rejected by MySQL. That should be fixable with an intelligent routine, that makes sure every line you read is properly encoded before handing it to MySQL, either removing or encoding everything MySQL would reject. We do have a set of Chefmoz related things in http://dmoz.org/Computers/Internet/Searching/Directories/Open_Directory_Project/Tools_for_Editors/ChefMoz_Editors/ - maybe there even is one or another piece of software linked that you can use, I didn't check. Yes I am using the built in xml parsing functions. But my problems are not with mysql at all. My problems are from the xml parser itself, complaining about random characters throughout the file. Heres a link to my code: http://pastebin.com/933659 As you can see its a pretty basic xml parser. I just can't figure out how to get it through. Is there some way I can clean up all these illegal characters? I planned on making a similar parser for the main dmoz rdf dump, since now i just have a string matching based parser that is easily broken. But I need to clean the data first. Even if illegal characters are just deleted, that would be a perfectly acceptable solution for my needs.
l008com2 Posted June 24, 2007 Author Posted June 24, 2007 Anyone out there have any advice. I'm having no luck at all trying to parse the chefmoz feed into a mysql database. And as I said, my problem isn't at all with the mysql side, its with getting the php xml parser to not choke on the chefmoz dump. Is there anywhere I might find more into on this?
sdang Posted January 17, 2008 Posted January 17, 2008 parsing chefmoz rdf??? Hi l008com2, did you ever find a solution? I've tried 2 or 3 scripts from the list of 'tools' for parsing rdf and having no luck. GEtting random errors. The script that has worked is suck_DMOZ in php but the data is not correct.
Rated Posted May 7, 2008 Posted May 7, 2008 Hi l008com2, did you ever find a solution? I've tried 2 or 3 scripts from the list of 'tools' for parsing rdf and having no luck. GEtting random errors. The script that has worked is suck_DMOZ in php but the data is not correct. There is one I found that worked for dmoz rdf files at... http://phpodpworld.sourceforge.net/ If you use mysql it will run into errors but they they can easily be fixed. It uses perl instead of php to extract the rdf files and insert data into the db. I'm also in the process of writing the same script for chefmoz. I will post it when I finish if your still looking.
Meta hansfn Posted May 8, 2008 Meta Posted May 8, 2008 If you use mysql it will run into errors but they they can easily be fixed. Hm, I haven't gotten any reports about such problems (and I know it has worked for MySQL before), but please tell me so I can fix it. And if you modify phpODPWorld so it also works with the Chefmoz RDF, I would very much like to include the modifications into the next release. PS! The project isn't dead - I have just been busy with other projects. I dod reply to e-mail and read the mailing list.
weglobenet Posted October 20, 2008 Posted October 20, 2008 Anyone out there have any advice. I'm having no luck at all trying to parse the chefmoz feed into a mysql database. And as I said, my problem isn't at all with the mysql side, its with getting the php xml parser to not choke on the chefmoz dump. Is there anywhere I might find more into on this? there is a chefmoz rdf converted to Mysql dump at http://www.we-globe.net/WebLab/Download/DmozRdf2MySQL.html
Fluesse09 Posted December 9, 2009 Posted December 9, 2009 What I am looking to do is this say I have 5 PHP files on a website and they all use the same database and mysql_connect string. Code: db = "myDatabase";link = mysql_connect"localhost", "UserName", "Password" or die"Could not connect to server";What I am looking to do is put this in a seperate file and just call it so if I ever need to change databases and/or location I only have to do it in one place. You can not just do the standard include: include"data/myConnection.php"; I just need to pass the 2 variables back to the main page so it will work any ideas?
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now