Given a domain name, I'm trying to extract a site description from the content data. When the domain is listed under multiple ExternalPage elements, how do I identify a "primary" one?
For example, yahoo.com has three:
When I search for yahoo.com on dmoz.org, the third description comes out at the top, so I'm guessing something identifies this as better than the other two. How do I figure this out?
For example, yahoo.com has three:
<ExternalPage about="http://www.yahoo.com/">
<d:Title>Yahoo!</d:Title>
<description>Yahoo!'s webservers exclusively run FreeBSD. In addition, all the non-production servers and developer workstations run FreeBSD.</description>
<priority>1</priority>
<topic>Top/Computers/Software/Operating_Systems/Unix/BSD/FreeBSD/Prominent_Users</topic>
</ExternalPage>
<ExternalPage about="http://www.yahoo.com/">
<d:Title>Yahoo!</d:Title>
<description>Personalized content and search options. Chatrooms, free e-mail, clubs, and pager.</description>
<priority>1</priority>
<topic>Top/Computers/Internet/On_the_Web/Web_Portals</topic>
</ExternalPage>
<ExternalPage about="http://www.yahoo.com/">
<d:Title>Yahoo!</d:Title>
<description>The first large scale directory of the Internet, now a major portal offering search engine results, customizable content, chatrooms, free e-mail, clubs, and pager.</description>
<priority>1</priority>
<topic>Top/Computers/Internet/Searching/Directories/Yahoo</topic>
</ExternalPage>
<d:Title>Yahoo!</d:Title>
<description>Yahoo!'s webservers exclusively run FreeBSD. In addition, all the non-production servers and developer workstations run FreeBSD.</description>
<priority>1</priority>
<topic>Top/Computers/Software/Operating_Systems/Unix/BSD/FreeBSD/Prominent_Users</topic>
</ExternalPage>
<ExternalPage about="http://www.yahoo.com/">
<d:Title>Yahoo!</d:Title>
<description>Personalized content and search options. Chatrooms, free e-mail, clubs, and pager.</description>
<priority>1</priority>
<topic>Top/Computers/Internet/On_the_Web/Web_Portals</topic>
</ExternalPage>
<ExternalPage about="http://www.yahoo.com/">
<d:Title>Yahoo!</d:Title>
<description>The first large scale directory of the Internet, now a major portal offering search engine results, customizable content, chatrooms, free e-mail, clubs, and pager.</description>
<priority>1</priority>
<topic>Top/Computers/Internet/Searching/Directories/Yahoo</topic>
</ExternalPage>
When I search for yahoo.com on dmoz.org, the third description comes out at the top, so I'm guessing something identifies this as better than the other two. How do I figure this out?