Information Retrieval: A Survey
30 November 2000
by
Ed Greengrass
Abstract
Information Retrieval (IR) is the discipline that deals with retrieval of unstructured
data, especially textual documents, in response to a query or topic statement, which
may itself be unstructured, e.g., a sentence or even another document, or which may
be structured, e.g., a boolean expression. The need for effective methods of automated
IR has grown in importance because of the tremendous explosion in the
amount of unstructured data, both internal, corporate document collections, and the
immense and growing number of document sources on the Internet. This report is a
tutorial and survey of the state of the art, both research and commercial, in this
dynamic field. The topics covered include: formulation of structured and unstructured
queries and topic statements, indexing (including term weighting) of document
collections, methods for computing the similarity of queries and documents,
classification and routing of documents in an incoming stream to users on the basis
of topic or need statements, clustering of document collections on the basis of language
or topic, and statistical, probabilistic, and semantic methods of analyzing and
retrieving documents.