|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object Q_Controller
public class Q_Controller
This applet demonstrates Q-learning for a particular grid world problem. It isn't designed to be general or reusable.
Copyright (C) 2003-2006 David Poole.
This program gives Q-learning code. The GUI is in Q_GUI.java. The controller code is at Q_Controller.java, and the environment simulation is at Q_Env.java.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
Field Summary | |
---|---|
boolean |
alphaFixed
|
double |
discount
|
double[][][] |
qvalues
The Q values: Q[xpos,ypos,action] |
boolean |
tracing
|
int[][][] |
visits
The number of times the agent has been at (xpos,ypos) and done action |
Method Summary | |
---|---|
void |
doreset(double initVal)
resets the Q-values sets all of the Q-values to initVal, and all of the visit counts to 0 |
void |
dostep(int action)
does one step carries out the action |
void |
dostep(int action,
double newdiscount,
double alphaFieldValue)
does one step carries out the action, and sets the discount and the alpha value |
void |
doSteps(int count,
double greedyProb,
double newdiscount,
double alphaFieldValue)
does count number of steps whether each step is greedy or random is determine by greedyProb |
double |
value(int xval,
int yval)
determines the value of a location the value is the maximum, for all actions, of the q-value |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public double[][][] qvalues
public int[][][] visits
public double discount
public boolean alphaFixed
public boolean tracing
Method Detail |
---|
public void doreset(double initVal)
initVal
- the initial value to set all values topublic void dostep(int action, double newdiscount, double alphaFieldValue)
action
- the action that the agent doesnewdiscount
- the discount to usealphaFieldValue
- the alpha value to usepublic void dostep(int action)
action
- the action that the agent doespublic double value(int xval, int yval)
xval
- the x-coordinateyval
- the y-coordinate
public void doSteps(int count, double greedyProb, double newdiscount, double alphaFieldValue)
count
- the number of steps to dogreedyProb
- the probability that is step is chosen greedilynewdiscount
- the discount to usealphaFieldValue
- the alpha value to use
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |