Saturday, July 11, 2009

Reinforcement Learning part2 ....

Element of reinforcement learning




Agent:

  • the general aim of Machine Learning is to produce intelligent programs, often called agents, through a process of learning and evolving.

Environment:

  • All external condition that effects an agent is included in environment.

Policy:

  • It is the set of rules which defines the agent’s behavior at a given time in a particular environment condition.
  • It direct the help in mapping from states to actions.
  • In some case the policy may be a simple function or lookup tables.
  • It is responsible to determine behavior

Reward function:

  • It basically defines the goal in a reinforcement learning problem
  • Maps from states, state-action pairs, or state-action- successor state
  • The policy is altered to achieve this goal

The sole objective of a reinforcement learning agent's is to maximize the total reward it receives in the long run (returns).

Value function:

  • Reward function indicates what is good in an immediate sense while a value function specifies what is good in the long run.
  • Value of a state is the total amount of reward an agent can expect top accumulate over the future, starting form that state.

Model of the environment :

  • predicts the mimics behavior of the environment
  • used for planning
  • if we know the the current state and action then model predict the resultant next state and next reward.

For example given a state and action, the model predict the resultant next state and next reward. Model are used for planning . By which we mean any way of deciding the set of action by considering possible future situation before they actually experience.


Agent-Environment Interface



The "cause and effect" idea can be translated into the following steps for an RL agent:

  1. The agent observes an input state
  2. An action is determined by a decision making function (policy)
  3. The action is performed
  4. The agent receives a scalar reward or reinforcement from the environment
  5. Information about the reward given for that state / action pair is recorded

Reinforcement Learning is learning how to act in order to maximize a numerical reward signal.

RL is learning from trial and error interaction with the world.

Eg : Cycle Learning is a good example of the reinforcement learning . The goal given to the Reinforcement Learning system is simply to ride the bicycle without falling over. Its only the experience which teaches a person how to ride a cycle. Initially the person performs a series of actions such as tilting their handle at 45 degree to right than if they fall down it give a negative feed-back that tilting this side is wrong. Similarly they try for left side and again get a negative feedback. By performing enough of these trial-and-error interactions with the environment, the RL system will ultimately learn how to prevent the bicycle from ever falling over.

Reinforcement learning is not a type of neural network, nor is it an alternative to neural networks. Rather, it is an orthogonal approach that addresses a different, more difficult question .i.e learning through experiences similar to human being.

Following are the some silent Features of Reinforcement Learning :

  • It is a set of problems rather than a set of techniques
  • RL brings a way of programming agents by reward and punishment(negative reward) without specifying how the task is to be achieved.Based on trial-error interactions

“RL as a tool” point of view:

  • RL is training by rewards and punishments.
  • Train the computer as we might train a dog.

The learning agent’s point of view:

  • RL is learning from trial and error with the world.
  • Eg. how much reward I much get if I get this .

Evaluative Feedback

The most important feature distinguishing between Reinforcement Learning and other type of learning is that RL uses evaluative training action , in-spite of instructive.

Purely evaluative feedback indicates how good the action taken is, but not whether it is the best or the worst action possible. Evaluative feedback is the basis of methods for function optimization.

Purely instructive feedback, on the other hand, indicates the correct action to take, independently of the action actually taken.

Eg: supervised learning is instructive

Associative and Non Associative Tasks:

Associative: Associative tasks are situation dependent. Input mapped to output. It involves both trial-and-error learning in the form of search for the best actions and association of these action with the situations in which they are best.

Non Associative: situation independence. there is no need for associating different action with different situation. Learn one best input. The learner tries to find a single best action when the task is stationary or tries to track the best action as it changes over time when the task is non stationary.

Reinforcement Learning

"What reinforcement we may gain from hope; if not, what resolution from despair. "


Reinforcement learning is kind of machine learning. So before going into the depth of Reinforcement learning , first see what is machine learning .

Machine learning: Definition

Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to learn based on data, such as from sensor data or databases. A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data .

Machine Learning Algorithms :

With respect to the feedback type to learner, the machine learning can be divided into three groups :

  • Supervised learningIn which the algorithm generates a function that maps inputs to desired outputs. We have both input and output. We can say that Supervised learning is task driven . It basically deals with classification.
  • Unsupervised learning An agent which models a set of inputs: labeled examples are not available. Here we try to make cluster of similar type of data and then fed them to the Neural Network. The neural network proceed until it reaches to a certain minim error or a predefined epochs. Model is prepared once in the starting of learning. It is data driven . It basically deals with clustering.
  • Reinforcement learning — This approach is very close to human learning. In Reinforcement learning the algorithm learns a policy of how to act in a given environment. Every action has some impact in the environment, and the environment provides feedback in the form of rewards that guides the learning algorithm.

Difference in Supervised , unsupervised and Reinforcement learning???

Supervised learning is task driven, that in advance we know that for a particular input what is the output, It basically deals with classification.

On the other hand in unsupervised learning is data driven i.e input is not mapped to the output .
It basically deals with clustering.

While Reinforcement learning emphasizes learning feedback that evaluates the learner's performance without providing standards of correctness in the form of behavioral targets.

The following example will explain u more :

Supervised Learning

Step: 1
Teacher: Does picture 1 show a tree or a bird ?

Learner: A flower.

Teacher: No, it’s a tree .

Step: 2
Teacher: Does picture 2 show a car or a bike?

Learner: A car.

Teacher: Yes, it’s a car.

Step: 3 ....

Reinforcement Learning

Step: 1
World: You are in state 4. Choose action A or C.

Learner: Action A.

World: Your reward is 100.

Step: 2

World: You are in state 27. Choose action B or E.
Learner: Action B.

World: Your reward is 50.

Step: 3 ....

so we can say that reinforcement learning doesn’t tell that the action is best of not. It only tell if it is good or bad by assigning scalar rewards .

Meaning of Reinforcement:

The occurrence of an event, in the proper relation to a response, that tends to increase the probability that the response will occur again in the same situation.

So we can define Reinforcement learning as a problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. Reinforcement learning emphasizes learning feedback that evaluates the learner's performance without providing standards of correctness in the form of behavioral targets.

The best example of reinforcement learning is Bicycle learning

Inception for Abecedarian

"''How do you know so much about everything?'' was asked of a very wise and intelligent man; and the answer was ''By never being afraid or ashamed to ask questions as to anything of which I was ignorant."

This blog is my inception toward a tribute to the word where i learn a lot. I assumed in the saying

“The fact that I can plant a seed and it becomes a flower, share a bit of knowledge and it becomes another's, smile at someone and receive a smile in return.”

So I thought y not share the information that I get so far so that a beginner or abecedarian can take assist from that .

In this blog i will discuss existing Techniques and Algorithms, latest updated in the computer Technology & automation.

you are free to express ur suggestions.