Jump to content

Recommended Posts

Posted
I hope that specifics like this are ok for this forum. I made a relatively simple PHP script for parsing XML files. I'm trying to parse the ChefMoz data into a mysql database. But alas, it doesn't work. Not matter how many ways I try to clean the data before passing it to the xml_parser, I still get invalid character and other errors, well before I make it more than 10% of the way through the dump. I've tried everything I could think of, and everything I've been able to find on the web, with really very little success. I'm hoping someone in here can help me get past this hurdle. I can post my script if you want to see it?
  • Meta
Posted

I´m afraid that we´re not good at answering specific technical questions regarding the parsing of RDF-files. The files are offered, but we don´t have any tech support for those.

 

From my understanding, it´s not a trivial everyday exercise to try to use the files through php and mysql. :2cents:

Curlie (Dmoz) Meta editor informator
  • Meta
Posted

Are you using XML specific routines? The system Chefmoz uses is older than the RDF standard and the XML standards, and the character encoding used has some errors (which is a known bug for several years)

 

 

I am usually parsing Chefmoz RDFs manually, without using any ready made components, and that is working pretty well. Unfortunately I have no clue about PHP, so I am sorry I won't be of any help.

 

 

Since the structure of the RDF file is - at least as far as I needed it - without errors, I assume your main problem lies in badly encoded data being rejected by MySQL. That should be fixable with an intelligent routine, that makes sure every line you read is properly encoded before handing it to MySQL, either removing or encoding everything MySQL would reject.

 

 

We do have a set of Chefmoz related things in http://dmoz.org/Computers/Internet/Searching/Directories/Open_Directory_Project/Tools_for_Editors/ChefMoz_Editors/ - maybe there even is one or another piece of software linked that you can use, I didn't check.

Curlie Meta/kMeta Editor windharp

 

d9aaee9797988d021d7c863cef1d0327.gif

Posted
Are you using XML specific routines? The system Chefmoz uses is older than the RDF standard and the XML standards, and the character encoding used has some errors (which is a known bug for several years)

 

I am usually parsing Chefmoz RDFs manually, without using any ready made components, and that is working pretty well. Unfortunately I have no clue about PHP, so I am sorry I won't be of any help.

 

Since the structure of the RDF file is - at least as far as I needed it - without errors, I assume your main problem lies in badly encoded data being rejected by MySQL. That should be fixable with an intelligent routine, that makes sure every line you read is properly encoded before handing it to MySQL, either removing or encoding everything MySQL would reject.

 

We do have a set of Chefmoz related things in http://dmoz.org/Computers/Internet/Searching/Directories/Open_Directory_Project/Tools_for_Editors/ChefMoz_Editors/ - maybe there even is one or another piece of software linked that you can use, I didn't check.

 

Yes I am using the built in xml parsing functions. But my problems are not with mysql at all. My problems are from the xml parser itself, complaining about random characters throughout the file.

 

Heres a link to my code:

http://pastebin.com/933659

 

As you can see its a pretty basic xml parser. I just can't figure out how to get it through. Is there some way I can clean up all these illegal characters? I planned on making a similar parser for the main dmoz rdf dump, since now i just have a string matching based parser that is easily broken. But I need to clean the data first. Even if illegal characters are just deleted, that would be a perfectly acceptable solution for my needs.

Posted
Anyone out there have any advice. I'm having no luck at all trying to parse the chefmoz feed into a mysql database. And as I said, my problem isn't at all with the mysql side, its with getting the php xml parser to not choke on the chefmoz dump. Is there anywhere I might find more into on this?
  • 6 months later...
Posted

parsing chefmoz rdf???

 

Hi l008com2, did you ever find a solution? I've tried 2 or 3 scripts from the list of 'tools' for parsing rdf and having no luck. GEtting random errors. The script that has worked is suck_DMOZ in php but the data is not correct.

  • 3 months later...
Posted
Hi l008com2, did you ever find a solution? I've tried 2 or 3 scripts from the list of 'tools' for parsing rdf and having no luck. GEtting random errors. The script that has worked is suck_DMOZ in php but the data is not correct.

 

There is one I found that worked for dmoz rdf files at...

 

http://phpodpworld.sourceforge.net/

 

If you use mysql it will run into errors but they they can easily be fixed. It uses perl instead of php to extract the rdf files and insert data into the db. I'm also in the process of writing the same script for chefmoz. I will post it when I finish if your still looking.

  • Meta
Posted
If you use mysql it will run into errors but they they can easily be fixed.

Hm, I haven't gotten any reports about such problems (and I know it has worked for MySQL before), but please tell me so I can fix it. And if you modify phpODPWorld so it also works with the Chefmoz RDF, I would very much like to include the modifications into the next release.

 

PS! The project isn't dead - I have just been busy with other projects. I dod reply to e-mail and read the mailing list.

  • 5 months later...
  • 1 year later...
Posted

What I am looking to do is this say I have 5 PHP files on a website and they all use the same database and mysql_connect string.

Code: db = "myDatabase";link = mysql_connect"localhost", "UserName", "Password" or die"Could not connect to server";What I am looking to do is put this in a seperate file and just call it so if I ever need to change databases and/or location I only have to do it in one place.

 

You can not just do the standard include:

include"data/myConnection.php";

 

I just need to pass the 2 variables back to the main page so it will work any ideas?

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...