SarsaController

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Class SarsaController

java.lang.Object
  SarsaController

public class SarsaController
extends java.lang.Object
extends java.lang.Object

This applet demonstrates SARSA(lambda)-learning for a particular grid world problem. It isn't designed to be general or reusable.

This program gives SARSA(labmda)-learning code. The GUI is in SarsaGUI.java. The controller code is at SarseCotroller.java. It used the environment Q_Env.java.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Field Summary
`double`	`alpha`
`double`	`discount`
`double[][][]`	`e` The the eligibility trace for state (xpos,ypos) and action
`double`	`greedyProb`
`double`	`lambda`
`int`	`prevAction`
`int`	`prevX`
`int`	`prevY`
`double[][][]`	`qvalues` The Q values: Q[xpos,ypos,action]
`boolean`	`replacingTrace`
`double`	`reward`
`boolean`	`tracing`

Method Summary
`void`	`doreset(double initVal)` resets the Q-values sets all of the Q-values to initVal, and the eligibility to 0
`void`	`dostep(int nextAction)` does one step carries out the action
`void`	`dostep(int nextAction, double newDiscount, double newAlpha, double newLambda)` does one step carries out the action, and sets the discount and the alpha value
`void`	`doSteps(int count, double newGreedyProb, double newDiscount, double newAlpha, double newLambda)` does count number of steps whether each step is greedy or random is determine by greedyProb
`double`	`value(int xval, int yval)` determines the value of a location the value is the maximum, for all actions, of the q-value

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

qvalues

public double[][][] qvalues

The Q values: Q[xpos,ypos,action]

e

public double[][][] e

The the eligibility trace for state (xpos,ypos) and action

discount

public double discount

alpha

public double alpha

lambda

public double lambda

greedyProb

public double greedyProb

tracing

public boolean tracing

prevX

public int prevX

prevY

public int prevY

prevAction

public int prevAction

reward

public double reward

replacingTrace

public boolean replacingTrace

Method Detail

doreset

public void doreset(double initVal)

resets the Q-values sets all of the Q-values to initVal, and the eligibility to 0

Parameters:: initVal - the initial value to set all values to

dostep

public void dostep(int nextAction,
                   double newDiscount,
                   double newAlpha,
                   double newLambda)

does one step carries out the action, and sets the discount and the alpha value

Parameters:: nextAction - the next action that the agent does; newDiscount - the discount to use; newAlpha - the new alpha value to use; newLambda - the new lambda value to use

dostep

public void dostep(int nextAction)

does one step carries out the action

Parameters:: nextAction - the next action that the agent does

value

public double value(int xval,
                    int yval)

determines the value of a location the value is the maximum, for all actions, of the q-value

Parameters:: xval - the x-coordinate; yval - the y-coordinate
Returns:: the value of the (xval,yval) position

doSteps

public void doSteps(int count,
                    double newGreedyProb,
                    double newDiscount,
                    double newAlpha,
                    double newLambda)

does count number of steps whether each step is greedy or random is determine by greedyProb

Parameters:: count - the number of steps to do; newGreedyProb - the probability that is step is chosen greedily; newDiscount - the discount to use; newAlpha - the new alpha value to use; newLambda - the new lambda value to use