A scaleable technique for best-match retrieval of sequential information using metrics-guided search.

Auteur(s)
Wolff, G.
Jaar
Samenvatting

A new technique is described for retrieving information by finding the best match or matches between a textual `query' and a textual database. The technique uses principles of beam search with a measure of probability to guide the search and prune the search tree. Unlike many methods for comparing strings, the method gives a set of alternative matches, graded by the `quality' of the matching achieved. For any one sequence of hits between a query and a database, the probability measure is an estimate of the probability that the observed configuration, or better, could have occurred by chance. This probability is an inverse measure of the redundancy between the query and the database. The new technique is embodied in a software simulation called SP21 which runs on a conventional computer. Examples are presented showing best-match retrieval of information from a textual database. Analytic and empirical evidence is presented showing that, in a serial processing environment, the search technique has time complexity of O(Q.D) where Q is the size of the query in characters and D is the size of the database in characters. The technique lends itself well to parallel processing. In that kind of environment, the time complexity of the process should approach O(Q), depending on how well the parallelism is applied. The space complexity of the process is O(D). Planned developments of these ideas in the future are described and discussed. (A)

Publicatie aanvragen

1 + 2 =
Los deze eenvoudige rekenoefening op en voer het resultaat in. Bijvoorbeeld: voor 1+3, voer 4 in.

Publicatie

Bibliotheeknummer
970495 ST [electronic version only]
Uitgave

Journal of Information Science, Vol. 20 (1994), No. 1, p. 16-28, 35 ref.

Onze collectie

Deze publicatie behoort tot de overige publicaties die we naast de SWOV-publicaties in onze collectie hebben.