Distinguishing duplicate ExternalPages

DrJ

Member
Joined
Jul 7, 2006
Messages
2
Given a domain name, I'm trying to extract a site description from the content data. When the domain is listed under multiple ExternalPage elements, how do I identify a "primary" one?

For example, yahoo.com has three:

<ExternalPage about="http://www.yahoo.com/">
<d:Title>Yahoo!</d:Title>
<d:Description>Yahoo!'s webservers exclusively run FreeBSD. In addition, all the non-production servers and developer workstations run FreeBSD.</d:Description>
<priority>1</priority>
<topic>Top/Computers/Software/Operating_Systems/Unix/BSD/FreeBSD/Prominent_Users</topic>
</ExternalPage>

<ExternalPage about="http://www.yahoo.com/">
<d:Title>Yahoo!</d:Title>
<d:Description>Personalized content and search options. Chatrooms, free e-mail, clubs, and pager.</d:Description>
<priority>1</priority>
<topic>Top/Computers/Internet/On_the_Web/Web_Portals</topic>
</ExternalPage>

<ExternalPage about="http://www.yahoo.com/">
<d:Title>Yahoo!</d:Title>
<d:Description>The first large scale directory of the Internet, now a major portal offering search engine results, customizable content, chatrooms, free e-mail, clubs, and pager.</d:Description>
<priority>1</priority>
<topic>Top/Computers/Internet/Searching/Directories/Yahoo</topic>
</ExternalPage>

When I search for yahoo.com on dmoz.org, the third description comes out at the top, so I'm guessing something identifies this as better than the other two. How do I figure this out?
 

arubin

Editall/Catmv
Joined
Mar 8, 2004
Messages
5,093
I don't think we identify the concept of "primary listing". In other words, they're all listings.
 

motsa

Curlie Admin
Joined
Sep 18, 2002
Messages
13,294
There's really no such thing as a "primary" listing.

When I search for yahoo.com on dmoz.org, the third description comes out at the top, so I'm guessing something identifies this as better than the other two.
The search results are more or less in random order (for all intents and purposes) -- I believe in this case it's ordering them roughly by when they were added to the directory but date of listing does not denote importance.
 
This site has been archived and is no longer accepting new content.
Top