著者
Sebastian B. Thrun
タイトル
Efficient Exploration In Reinforcement Learning
日時
January 1992
概要
Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, discreate domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper distinguishes between two families of schemes: undirected and directed exploration. While the former family is closely related to random walk exploration, directed exploration techniques memorize exploration-specific knowledge which is used for guiding the exploration search. In many finite deterministic domains, any learning technique based on undirected exploration search. In many finite deterministic domains, any learning technique based on undirected exploration is inefficient in terms of learning time, i.e. learning time is expected to scale exponentially with the size of the state space [Whitehead 1991]. We prove that for all these domains, reinforcement learning using a directed technique can always be performed in polynomial time, demonstrating the important role of exploration in reinforcement learning. (The proof is given for one specific directed exploration technique named counter-based exploration.) Subsequently, several exploration techniques found in recent reinforcement learning and connectionist adaptive control literature are described . In order to trade off efficiently between exploration and exploitation - a trade-off which characterizes many real-world active learning tasks - combination methods are described which explore and avoid costs simultaneously. This includes a selective attention mechanism, which allows smooth switching between exploration and exploitation. All techniques are evaluated and compared on a discrete reinforcement learning task (robot navigation). The empirical evaluation is followed by an extensive discussion of benefits and limitations of this work.
カテゴリ
CMUTR
Category: CMUTR
Institution: Department of Computer Science, Carnegie
        Mellon University
Abstract: Exploration plays a fundamental role in any active learning 
        system.
        This study evaluates the role of exploration in active 
        learning and describes several local techniques for exploration
        in finite, discreate domains, embedded in a reinforcement 
        learning framework (delayed reinforcement).
        
        This paper distinguishes between two families of schemes: 
        undirected and directed exploration.
        While the former family is closely related to random walk 
        exploration, directed exploration techniques memorize 
        exploration-specific knowledge which is used for guiding the 
        exploration search.
        In many finite deterministic domains, any learning technique 
        based on undirected exploration search.
        In many finite deterministic domains, any learning technique 
        based on undirected exploration is inefficient in terms of 
        learning time, i.e. learning time is expected to scale 
        exponentially with the size of the state space [Whitehead 1991].
        We prove that for all these domains, reinforcement learning 
        using a directed technique can always be performed in 
        polynomial time, demonstrating the important role of exploration
        in reinforcement learning.
        (The proof is given for one specific directed exploration 
        technique named counter-based exploration.) 
        
        Subsequently, several exploration techniques found in recent 
        reinforcement learning and connectionist adaptive control 
        literature are described .
        In order to trade off efficiently between exploration and 
        exploitation - a trade-off which characterizes many real-world 
        active learning tasks - combination methods are described which
        explore and avoid costs simultaneously.
        This includes a selective attention mechanism, which allows
        smooth switching between exploration and exploitation.
        
        All techniques are evaluated and compared on a discrete 
        reinforcement learning task (robot navigation).
        The empirical evaluation is followed by an extensive discussion
        of benefits and limitations of this work.
        
        
        
Number: CMU-CS-92-102
Bibtype: TechReport
Month: jan
Author: Sebastian B. Thrun
Title: Efficient Exploration In Reinforcement Learning
Year: 1992
Address: Pittsburgh, PA
Super: @CMUTR