From:
Michael JasonSmith
Date:
Jul 02 05:13 UTC
Short link
I was thinking about search results when I read the following quote
from Scott Prevost, in an article about the Microsoft Bling search
engine. He was discussing the captions that appear in search results.
“One of the challenges in developing captions is
finding the right pieces of text on a page to
represent that link, so semantic processing really
helps. It helps pick the right sentences, sentences
that may have the right concepts but not necessarily
the keywords from [the user's query]. It helps us
pick the piece of the sentence that's most relevant
and not chop it off in places that makes it unreadable…”
http://www.theregister.co.uk/2009/07/01/powerset_and_bing/
GroupServer displays captions in the search results for posts
http://groupserver.org/s?t=0&p=1
Like Scott Prevost was saying, the sample caption is one that
contains the search term
http://groupserver.org/s?s=search&t=0&p=1
However, that is only for posts. Keywords are returned on topic
searches:
http://groupserver.org/s
They are words that appear frequently in a topic but not frequently
in other topics (the tried and true TF-IDF algorithm). Like the
captions, the keywords augment the subject, providing more
“information scent”. Unlike the captions, they reveal some of the
deeper meaning behind the document.
I was thinking that we could do a similar thing to Microsoft Bling.
Calculate the keywords for a *post* and use those to select a
*sentence* that characterises the document. Currently we do not have
the infrastructure to do this, but Richard has plans for eventually
using a full-text retrieval system to support our search system. It
would be worth looking at this idea when that is integrated with
GroupServer.