Saturday, July 11, 2009

Reinforcement Learning part2 ....

Element of reinforcement learning




Agent:

  • the general aim of Machine Learning is to produce intelligent programs, often called agents, through a process of learning and evolving.

Environment:

  • All external condition that effects an agent is included in environment.

Policy:

  • It is the set of rules which defines the agent’s behavior at a given time in a particular environment condition.
  • It direct the help in mapping from states to actions.
  • In some case the policy may be a simple function or lookup tables.
  • It is responsible to determine behavior

Reward function:

  • It basically defines the goal in a reinforcement learning problem
  • Maps from states, state-action pairs, or state-action- successor state
  • The policy is altered to achieve this goal

The sole objective of a reinforcement learning agent's is to maximize the total reward it receives in the long run (returns).

Value function:

  • Reward function indicates what is good in an immediate sense while a value function specifies what is good in the long run.
  • Value of a state is the total amount of reward an agent can expect top accumulate over the future, starting form that state.

Model of the environment :

  • predicts the mimics behavior of the environment
  • used for planning
  • if we know the the current state and action then model predict the resultant next state and next reward.

For example given a state and action, the model predict the resultant next state and next reward. Model are used for planning . By which we mean any way of deciding the set of action by considering possible future situation before they actually experience.


Agent-Environment Interface



The "cause and effect" idea can be translated into the following steps for an RL agent:

  1. The agent observes an input state
  2. An action is determined by a decision making function (policy)
  3. The action is performed
  4. The agent receives a scalar reward or reinforcement from the environment
  5. Information about the reward given for that state / action pair is recorded

Reinforcement Learning is learning how to act in order to maximize a numerical reward signal.

RL is learning from trial and error interaction with the world.

Eg : Cycle Learning is a good example of the reinforcement learning . The goal given to the Reinforcement Learning system is simply to ride the bicycle without falling over. Its only the experience which teaches a person how to ride a cycle. Initially the person performs a series of actions such as tilting their handle at 45 degree to right than if they fall down it give a negative feed-back that tilting this side is wrong. Similarly they try for left side and again get a negative feedback. By performing enough of these trial-and-error interactions with the environment, the RL system will ultimately learn how to prevent the bicycle from ever falling over.

Reinforcement learning is not a type of neural network, nor is it an alternative to neural networks. Rather, it is an orthogonal approach that addresses a different, more difficult question .i.e learning through experiences similar to human being.

Following are the some silent Features of Reinforcement Learning :

  • It is a set of problems rather than a set of techniques
  • RL brings a way of programming agents by reward and punishment(negative reward) without specifying how the task is to be achieved.Based on trial-error interactions

“RL as a tool” point of view:

  • RL is training by rewards and punishments.
  • Train the computer as we might train a dog.

The learning agent’s point of view:

  • RL is learning from trial and error with the world.
  • Eg. how much reward I much get if I get this .

Evaluative Feedback

The most important feature distinguishing between Reinforcement Learning and other type of learning is that RL uses evaluative training action , in-spite of instructive.

Purely evaluative feedback indicates how good the action taken is, but not whether it is the best or the worst action possible. Evaluative feedback is the basis of methods for function optimization.

Purely instructive feedback, on the other hand, indicates the correct action to take, independently of the action actually taken.

Eg: supervised learning is instructive

Associative and Non Associative Tasks:

Associative: Associative tasks are situation dependent. Input mapped to output. It involves both trial-and-error learning in the form of search for the best actions and association of these action with the situations in which they are best.

Non Associative: situation independence. there is no need for associating different action with different situation. Learn one best input. The learner tries to find a single best action when the task is stationary or tries to track the best action as it changes over time when the task is non stationary.

1 comment: