著者
J. K. Cringean, R. England, G. A. Manson, P. WillettYoung M. Kim, Dik Lun Lee
タイトル
Parallel text searching in serial files using a processor farmEfficient search methods for signature files
書籍
Proceedings of the 13th International Conference on Research and Development in Information Retrieval
ページ
429-54
日時
March 1990
概要
The paper discusses the implementation of a parallel text retrieval system using a microprocessor network. The system is designed to allow fast searching in document databases organised using the serial file structure, with a very rapid initial text signature search being followed by a more detailed, but more time-consuming, pattern matching search. The network is built from transputers, high performance microprocessors developed specifically for the construction of highly parallel computing systems, which are linked together in a processor farm. The paper discusses the design and implementation of processor farms, and then reports initial studies of the efficiency of searching that can be achieved using this approach to text retrieval from serial filesMany approaches have been proposed for searching signature files efficiently. These methods apply different techniques to reduce the number of block signatures that need to be accessed and compared to the query signature. Owing to the difference in the performance measures and assumptions used in these methods, it is difficult to determine which method is the best under a common condition. In this paper, we study three basic methods proposed in the literature, namely, the indexed descriptor file\cite{Pfaltz:indexedsignature}, the two-level superimposed coding scheme\cite{SacksDavis:twosuperimpose}, and the partitioned signature file approach\cite{Lee:partition}. The contribution of this paper is two-fold. We present a uniform analytic performance model so that these methods can be compared fairly and consistently. We show that the two-level superimposed coding scheme, if stored in a transposed file\cite{Lee:signatureprocessor} is the best in performance. We then introduce an improved method, the multi-level superimposed coding method, which is an extension to the two-level superimposed coding method. We demonstrate that the two-level method is not optimal, and obtain the optimal number of levels for the multi-level method.
コメント
シグナチャの高速検索各方式の比較。
カテゴリ
Signature
Category: Signature
Institution: Ohio State University, Computer and Information
        Science Research Center
Comment: シグナチャの高速検索各方式の比較。
Abstract: The paper discusses the implementation of a parallel
        text retrieval system using a microprocessor
        network. The system is designed to allow fast
        searching in document databases organised using the
        serial file structure, with a very rapid initial
        text signature search being followed by a more
        detailed, but more time-consuming, pattern matching
        search. The network is built from transputers, high
        performance microprocessors developed specifically
        for the construction of highly parallel computing
        systems, which are linked together in a processor
        farm. The paper discusses the design and
        implementation of processor farms, and then reports
        initial studies of the efficiency of searching that
        can be achieved using this approach to text
        retrieval from serial filesMany approaches have been proposed for searching
        signature files efficiently.  These methods apply
        different techniques to reduce the number of block
        signatures that need to be accessed and compared to
        the query signature.  Owing to the difference in the
        performance measures and assumptions used in these
        methods, it is difficult to determine which method
        is the best under a common condition. In this paper,
        we study three basic methods proposed in the
        literature, namely, the indexed descriptor
        file\cite{Pfaltz:indexedsignature}, the
        two-level superimposed coding
        scheme\cite{SacksDavis:twosuperimpose}, and the
        partitioned signature file
        approach\cite{Lee:partition}. The contribution of
        this paper is two-fold. We present a uniform
        analytic performance model so that these methods can
        be compared fairly and consistently. We show that
        the two-level superimposed coding scheme, if stored
        in a transposed file\cite{Lee:signatureprocessor} is
        the best in performance.  We then introduce an
        improved method, the multi-level superimposed coding
        method, which is an extension to the two-level
        superimposed coding method.  We demonstrate that the
        two-level method is not optimal, and obtain the
        optimal number of levels for the multi-level method.
Number: OSU-CISRC-3/90-TR8
Bibtype: TechReport
Booktitle: Proceedings of the 13th International Conference on
        Research and Development in Information Retrieval
Author: J. K. Cringean
        R. England
        G. A. Manson
        P. WillettYoung M. Kim
        Dik Lun Lee
Pages: 429-54
Month: mar
Title: Parallel text searching in serial files using a
        processor farmEfficient search methods for signature files
Year: 1990
Keyword: database management systems, file organisation,
        information retrieval, information retrieval
        systems, parallel programming, serial files,
        processor farm, parallel text retrieval system,
        microprocessor network, fast searching, document
        databases, pattern matching search, transputers
Address: 2036 Neil Avenue Mall, Columbus, Ohio 43210