Jump to content

breaking up category dump into manageable sub-cats


Recommended Posts

Guest sperugin
Posted

Hello,

 

Has anyone had any success breaking up the category RDF dump into

sub-categories? The dump is just too big to work with directly.

 

I need to extract certain information from the category dump, but the PHP

script I use to do so is taking too long to run (~10 hours). Therefore, I

wrote the following XSLT transformation to break the dump into sub-branches

(e.g., Arts, Games, News) prior to running my PHP script.

 

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

xmlns:r="http://www.w3.org/TR/RDF/"

xmlns:d="http://purl.org/dc/elements/1.0/"

xmlns="http://dmoz.org/rdf" version="1.0">

 

<xsl:output method="xml"/>

 

<xsl:template match="node()[name() = 'Topic' and not(starts-with(@r:id, 'Top/News'))]"/>

 

<!-- matches any node, including the root -->

<xsl:template match="*|@*">

<xsl:copy>

<!-- continues on any nodes except the root -->

<xsl:apply-templates select="@*|node()"/>

</xsl:copy>

</xsl:template>

</xsl:stylesheet>

 

However, when I execute this transformation I run out of memory. I am running

it on an Apple PowerBook G4.

 

Does anyone have any methods of breaking up the category dump which they would

like to share?

 

Thanks,

Saverio

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...