Q_Controller

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

Class Q_Controller

java.lang.Object
  Q_Controller

public class Q_Controller
extends java.lang.Object
extends java.lang.Object

This applet demonstrates Q-learning for a particular grid world problem. It isn't designed to be general or reusable.

This program gives Q-learning code. The GUI is in Q_GUI.java. The controller code is at Q_Controller.java, and the environment simulation is at Q_Env.java.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

Field Summary
`boolean`	`alphaFixed`
`double`	`discount`
`double[][][]`	`qvalues` The Q values: Q[xpos,ypos,action]
`boolean`	`tracing`
`int[][][]`	`visits` The number of times the agent has been at (xpos,ypos) and done action

Method Summary
`void`	`doreset(double initVal)` resets the Q-values sets all of the Q-values to initVal, and all of the visit counts to 0
`void`	`dostep(int action)` does one step carries out the action
`void`	`dostep(int action, double newdiscount, double alphaFieldValue)` does one step carries out the action, and sets the discount and the alpha value
`void`	`doSteps(int count, double greedyProb, double newdiscount, double alphaFieldValue)` does count number of steps whether each step is greedy or random is determine by greedyProb
`double`	`value(int xval, int yval)` determines the value of a location the value is the maximum, for all actions, of the q-value

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

qvalues

public double[][][] qvalues

The Q values: Q[xpos,ypos,action]

visits

public int[][][] visits

The number of times the agent has been at (xpos,ypos) and done action

discount

public double discount

alphaFixed

public boolean alphaFixed

tracing

public boolean tracing

Method Detail

doreset

public void doreset(double initVal)

resets the Q-values sets all of the Q-values to initVal, and all of the visit counts to 0

Parameters:: initVal - the initial value to set all values to

dostep

public void dostep(int action,
                   double newdiscount,
                   double alphaFieldValue)

does one step carries out the action, and sets the discount and the alpha value

Parameters:: action - the action that the agent does; newdiscount - the discount to use; alphaFieldValue - the alpha value to use

dostep

public void dostep(int action)

does one step carries out the action

Parameters:: action - the action that the agent does

value

public double value(int xval,
                    int yval)

determines the value of a location the value is the maximum, for all actions, of the q-value

Parameters:: xval - the x-coordinate; yval - the y-coordinate
Returns:: the value of the (xval,yval) position

doSteps

public void doSteps(int count,
                    double greedyProb,
                    double newdiscount,
                    double alphaFieldValue)

does count number of steps whether each step is greedy or random is determine by greedyProb

Parameters:: count - the number of steps to do; greedyProb - the probability that is step is chosen greedily; newdiscount - the discount to use; alphaFieldValue - the alpha value to use