Guest sperugin Posted November 6, 2003 Posted November 6, 2003 Hello, I am interested in collecting sequences of hyperlink labels (the text anchoring the ahref) for various sub-branches of ODP. These sequences are essentially the breadcrumbs at the top of each page. The URL also happens to mirror the current sequence. For example, the News sub-branch contains the following selected sequences: News: Breaking News: Business and Economy ... News: Breaking News: Official Press Releases ... ... ... I however want to preserve crosslinks from one branch of the directory to another, which the breadcrumb and URL do not do. I'm interested in collecting such sequences on the order of thousands in selected sub-branches. I know that the ODP dump is available in RDF. My question is what is the best method to collect such sequences in bulk from the available data (e.g., use XSLT)? Thank You, Saverio
senox Posted November 8, 2003 Posted November 8, 2003 You know that there also is a RDF dump available at http://rdf.dmoz.org/ which only contains the category hierarchy information, don't you? It shouldn't be to difficult to parse this one and extract the information you're looking for.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now