Reinforcement learning:
In
reinforcement learning a teacher is available, but the teacher instead of
directly providing the desired action corresponding to a perception, return
reward and punishment to the learner for its action corresponding to a
perception.
Examples include a
robot in a unknown terrain where its get a punishment when its hits
an obstacle and
reward when it moves smoothly.
In
order to design a learning system the designer has to make the following
choices
based on the
application.
Active Reinforcement learning:
Here
not only a teacher is available, the learner has the freedom to ask
the teacher for suitable
perception-action example pairs which will help the learner to
improve its
performance.
Consider
a news recommender system which tries to learn an users preferences and
categorize news articles as interesting or uninteresting to the user.
The
system may present a particular article (of which it is not sure) to the user
and ask
whether it is
interesting or not.
Passive Reinforcement learning:
By instantiating
subsets of the variables, we can break loops in the graph. Unfortunately, when
the cutset is large, this is very slow. By instantiating only a subset of
values of the cutset, we can compute lower bounds on the probabilities of
interest. Alternatively, we can sample the cutsets jointly, a technique known
as block Gibbs sampling.
Generalization
in Reinforcement Learning
Example to generalize Reinforcement Learning
Training
Examples:
D : The set of training examples.
D is a set of pairs { (x,c(x)) },
where c is the target concept. c is a subset of the universe
of discourse or the set of all
possible instances.
Example
of D:
((red,small,round,humid,low,smooth),
poisonous)
((red,small,elongated,humid,low,smooth),
poisonous)
((gray,large,elongated,humid,low,rough),
not-poisonous)
((red,small,elongated,humid,high,rough),
poisonous)
Hypothesis
Representation
Any hypothesis h is a function
from X to Y h: X Y
We will explore the space of
conjunctions.
Special
symbols:
? Any value is acceptable
0 no value is acceptable
Consider the following
hypotheses:
(?,?,?,?,?,?): all mushrooms are
poisonous
(0,0,0,0,0,0): no mushroom is
poisonous
Hypotheses
Space:
The space of all hypotheses is
represented by H
Let h be a hypothesis in H.
Let X be an example of a
mushroom.
if h(X) = 1 then X is poisonous,
otherwise X is not-poisonous
Our goal is to find the hypothesis,
h*, that is very “close” to target concept c.
A hypothesis is said to “cover”
those examples it classifies as positive.
No comments:
Post a Comment