A scaleable technique for best-match retrieval of sequential information using metrics-guided search.

Abstract

A new technique is described for retrieving information by finding the best match or matches between a textual `query' and a textual database. The technique uses principles of beam search with a measure of probability to guide the search and prune the search tree. Unlike many methods for comparing strings, the method gives a set of alternative matches, graded by the `quality' of the matching achieved. For any one sequence of hits between a query and a database, the probability measure is an estimate of the probability that the observed configuration, or better, could have occurred by chance. This probability is an inverse measure of the redundancy between the query and the database. The new technique is embodied in a software simulation called SP21 which runs on a conventional computer. Examples are presented showing best-match retrieval of information from a textual database. Analytic and empirical evidence is presented showing that, in a serial processing environment, the search technique has time complexity of O(Q.D) where Q is the size of the query in characters and D is the size of the database in characters. The technique lends itself well to parallel processing. In that kind of environment, the time complexity of the process should approach O(Q), depending on how well the parallelism is applied. The space complexity of the process is O(D). Planned developments of these ideas in the future are described and discussed. (A)

A scaleable technique for best-match retrieval of sequential information using metrics-guided search.

Request publication

Publication

Our collection