著者
Kevin Knight
タイトル
Integrating Knowledge Acquisition and Language Acquisition
日時
August 1991
概要
Very large knowledge bases (KB's) constitute an important step for artificial intelligence and will have significant effects on the field of natural language processing. This thesis addresses the problem of effectively acquiring two large bodies of formalized knowledge: knowledge about world (a KB), and knowledge about words (a lexicon). The central observation is that these two bodies of knowledge are highly redundant. For example, the syntactic behavior of a noun (or a verb) is highly correlated with certain physical properties of the object (or event) to which it refers. It should be possible to take advantage of this type of redundancy in order to greatly reduce both the time and expertise required to build large KB's and lexicons. This thesis describes LUKE, a software tool that allows a knowledge base builder to create an English language interface by associating words and phrases with KB entities. LUKE assumes no linguistic expertise on the part of the user, because that expertise is built directly into the tool itself. LUKE draws its power from a large set of heuristics about how words are typically used to describe the world. These heuristics exploit the redundancy between linguistic and world knowledge. When a word or phrase is associated with some KB entity, LUKE is able to accurately guess features of the word based on features of the word based on features of the KB entity. LUKE can also hypothesize new words and word senses based on the existence of others. All of LUKE's hypotheses are displayed to the user for verification, using a format designed to tap the user's basic linguistic intuitions. LUKE stores its lexicon in the KB. Truth maintenance links ensure that changes in the KB are automatically propagated to the lexicon. LUKE compiles lexical entries into data structures convenient for natural language parsing and generation programs. Lexicons acquired by LUKE have been used by KBNL, a knowledge- based natural language system, for applications in information retrieval, machine translation, and KB navigation. This work identifies several dozen heuristics that encode redundancies between linguistic representations and representations of world knowledge. It also demonstrates the usefulness of these heuristics in a working lexical acquisition system.
カテゴリ
CMUTR
Category: CMUTR
Institution: Department of Computer Science, Carnegie
        Mellon University
Abstract: Very large knowledge bases (KB's) constitute an important step
        for artificial intelligence and will have significant effects on
        the field of natural language processing.
        This thesis addresses the problem of effectively acquiring two 
        large bodies of formalized knowledge: knowledge about world 
        (a KB), and knowledge about words (a lexicon).
        The central observation is that these two bodies of knowledge 
        are highly redundant.
        For example, the syntactic behavior of a noun (or a verb) is
        highly correlated with certain physical properties of the object
        (or event) to which it refers.
        It should be possible to take advantage of this type of 
        redundancy in order to greatly reduce both the time and 
        expertise required to build large KB's and lexicons.
        
        This thesis describes LUKE, a software tool that allows a 
        knowledge base builder to create an English language interface
        by associating words and phrases with KB entities.
        LUKE assumes no linguistic expertise on the part of the user,
        because that expertise is built directly into the tool itself.
        LUKE draws its power from a large set of heuristics about how
        words are typically used to describe the world.
        These heuristics exploit the redundancy between linguistic and 
        world knowledge.
        When a word or phrase is associated with some KB entity, LUKE
        is able to accurately guess features of the word based on 
        features of the word based on features of the KB entity.
        LUKE can also hypothesize new words and word senses based on 
        the existence of others.
        All of LUKE's hypotheses are displayed to the user for
        verification, using a format designed to tap the user's basic
        linguistic intuitions.
        
        LUKE stores its lexicon in the KB.
        Truth maintenance links ensure that changes in the KB are 
        automatically propagated to the lexicon.
        LUKE compiles lexical entries into data structures convenient
        for natural language parsing and generation programs.
        Lexicons acquired by LUKE have been used by KBNL, a knowledge-
        based natural language system, for applications in information
        retrieval, machine translation, and KB navigation.
        
        This work identifies several dozen heuristics that encode 
        redundancies between linguistic representations and 
        representations of world knowledge.
        It also demonstrates the usefulness of these heuristics in a 
        working lexical acquisition system.
Number: CMU-CS-91-209
Bibtype: TechReport
Month: aug
Author: Kevin Knight
Title: Integrating Knowledge Acquisition and Language Acquisition
Year: 1991
Address: Pittsburgh, PA
Super: @CMUTR