  • On Ranking Functions in Full Text Retrieval Databases for the World Wide Web A little paper I wrote about ranking webpages for my databases class. More of a general overview introducing vector space models, latent semantic indexing, page rank, ranking using user click-through data and web-communities.
  • To the Harvesta-homepage - Knowledge Management and Information Retrieval Software A new approach for Knowledge Management with a nifty information retrieval method. I've done some work for that one (that's why the link). Site includes links to a white-paper.
  • Harvesta Whitepaper The approach of Harvesta. A little paper Frank and I wrote (October 2001).

    Abstract: In the last few years, both the need for and the research in the area of text categorization has seen tremendous growth. This growth is a natural consequence of the vast amounts of information available through the Internet as well as the competitive advantage of knowledge. In this paper we present the Harvesta System, an information management system specifically designed to fit the needs of corporate intelligence. It accesses natural language texts (unstructured information) by generating what we call descriptors that represent the contained relevant information.
    Unlike with established text categorization approaches, the descriptors for a certain text are not fixed. Rather, they change over time based on the context information extracted from additional texts.
    The Harvesta Server determines relevant topics and consequently helps to cope with information overload. Relevant information is made accessible very quickly and without any kind of administrative effort. Retrieval of only those documents that contain relevant information saves precious time by not sifting through uninteresting messages.
    Since dynamic changes in descriptors reflect changes in the processed data, they are well-suited to detect trends, both weak signal and upcoming major trends.

