Search Prototype
From:
Michael JasonSmith
Date:
2007 May 09 01:18 UTC
Short link
I have been working on a new search interface for GroupServer. The new system
has enough features and enough stability to allow me to demonstrate the
prototype, even though it is far from finished. You can access it from
http://groupserver.org/groups/development/search_prototype
Currently, only topics are searched.
* The results are displayed in temporal-order, with the
most recent item at the top, as is typical for GroupServer.
* The metadata displayed about each topic should be familiar:
subject, group, number of posts, and the date of the last
post. This is the same information as is displayed on the
Topics page for a group
http://groupserver.org/groups/development/messages
* In addition to the standard metadata, six keywords for each
topic are displayed.
The keywords are extracted using the TF-IDF algorithm. This algorithm states
that a good keyword appears frequently in one document (a topic, in this case)
but does not occur frequently in all other documents. This works surprisingly
well, but it is not perfect. Currently, the keywords are displayed with the
best keyword first, but this may change to alphabetical in the future.
A search may be limited to just a group, which is not the case with the current
system. To do this, click on a disclosure button below the search result, and
click on the link “Only search the group…”. Clicking on “Search for [related]
topics” will display the results of a search that uses the topic-keywords as
the search text.
As is standard with searching, multiple keywords are allowed (separated by
spaces). The results are combined with an inclusive-or: a search for "email
post" will return all topics that contain the word "email", and all the topics
that contain the word "post". This causes quite a few items to be returned when
performing a related-topic search, so I may look ordering the topics by
relevancy in this situation. Only individual words, not phrases, can be
searched, but this will probably change in the future. However, phrase
searching will be *very* slow.
Only the first six topics are returned at the moment, but this will change in
the future. In addition, I will add the ability to view the previous and next
chunks, like the latest topics page, and the latest posts page.
My schedule of items to implement is as follows.
1. Searching files,
2. Searching individual posts,
3. Integration with the rest of GroupServer, and
4. Searching user-profiles.
The ultimate aim of the integration is for every topic, file, post and
user-list to be generated from the same search code. So the list of topics on
the site and group homepages will be a search, and clicking “View more” will
take you to a search-page.
It is *very* unlikely that GroupServer will support an “advanced search”, which
will allow the use of Boolean operators. Most users cannot write queries with
Boolean operators correctly, and those that can rarely find what they are
looking for when they do. However, I will publish the API, which can be used to
perform weird and wonderful searches that are not supported by the Web
interface.