Class SarsaController

java.lang.Object
  extended by SarsaController

public class SarsaController
extends java.lang.Object

This applet demonstrates SARSA(lambda)-learning for a particular grid world problem. It isn't designed to be general or reusable.

Copyright (C) 2003-2006 David Poole.

This program gives SARSA(labmda)-learning code. The GUI is in SarsaGUI.java. The controller code is at SarseCotroller.java. It used the environment Q_Env.java.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.


Field Summary
 double alpha
           
 double discount
           
 double[][][] e
          The the eligibility trace for state (xpos,ypos) and action
 double greedyProb
           
 double lambda
           
 int prevAction
           
 int prevX
           
 int prevY
           
 double[][][] qvalues
          The Q values: Q[xpos,ypos,action]
 boolean replacingTrace
           
 double reward
           
 boolean tracing
           
 
Method Summary
 void doreset(double initVal)
          resets the Q-values sets all of the Q-values to initVal, and the eligibility to 0
 void dostep(int nextAction)
          does one step carries out the action
 void dostep(int nextAction, double newDiscount, double newAlpha, double newLambda)
          does one step carries out the action, and sets the discount and the alpha value
 void doSteps(int count, double newGreedyProb, double newDiscount, double newAlpha, double newLambda)
          does count number of steps whether each step is greedy or random is determine by greedyProb
 double value(int xval, int yval)
          determines the value of a location the value is the maximum, for all actions, of the q-value
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

qvalues

public double[][][] qvalues
The Q values: Q[xpos,ypos,action]


e

public double[][][] e
The the eligibility trace for state (xpos,ypos) and action


discount

public double discount

alpha

public double alpha

lambda

public double lambda

greedyProb

public double greedyProb

tracing

public boolean tracing

prevX

public int prevX

prevY

public int prevY

prevAction

public int prevAction

reward

public double reward

replacingTrace

public boolean replacingTrace
Method Detail

doreset

public void doreset(double initVal)
resets the Q-values sets all of the Q-values to initVal, and the eligibility to 0

Parameters:
initVal - the initial value to set all values to

dostep

public void dostep(int nextAction,
                   double newDiscount,
                   double newAlpha,
                   double newLambda)
does one step carries out the action, and sets the discount and the alpha value

Parameters:
nextAction - the next action that the agent does
newDiscount - the discount to use
newAlpha - the new alpha value to use
newLambda - the new lambda value to use

dostep

public void dostep(int nextAction)
does one step carries out the action

Parameters:
nextAction - the next action that the agent does

value

public double value(int xval,
                    int yval)
determines the value of a location the value is the maximum, for all actions, of the q-value

Parameters:
xval - the x-coordinate
yval - the y-coordinate
Returns:
the value of the (xval,yval) position

doSteps

public void doSteps(int count,
                    double newGreedyProb,
                    double newDiscount,
                    double newAlpha,
                    double newLambda)
does count number of steps whether each step is greedy or random is determine by greedyProb

Parameters:
count - the number of steps to do
newGreedyProb - the probability that is step is chosen greedily
newDiscount - the discount to use
newAlpha - the new alpha value to use
newLambda - the new lambda value to use