|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.ObjectQ_Controller
public class Q_Controller
This applet demonstrates Q-learning for a particular grid world problem. It isn't designed to be general or reusable.
Copyright (C) 2003-2006 David Poole.
This program gives Q-learning code. The GUI is in Q_GUI.java. The controller code is at Q_Controller.java, and the environment simulation is at Q_Env.java.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
| Field Summary | |
|---|---|
boolean |
alphaFixed
|
double |
discount
|
double[][][] |
qvalues
The Q values: Q[xpos,ypos,action] |
boolean |
tracing
|
int[][][] |
visits
The number of times the agent has been at (xpos,ypos) and done action |
| Method Summary | |
|---|---|
void |
doreset(double initVal)
resets the Q-values sets all of the Q-values to initVal, and all of the visit counts to 0 |
void |
dostep(int action)
does one step carries out the action |
void |
dostep(int action,
double newdiscount,
double alphaFieldValue)
does one step carries out the action, and sets the discount and the alpha value |
void |
doSteps(int count,
double greedyProb,
double newdiscount,
double alphaFieldValue)
does count number of steps whether each step is greedy or random is determine by greedyProb |
double |
value(int xval,
int yval)
determines the value of a location the value is the maximum, for all actions, of the q-value |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public double[][][] qvalues
public int[][][] visits
public double discount
public boolean alphaFixed
public boolean tracing
| Method Detail |
|---|
public void doreset(double initVal)
initVal - the initial value to set all values to
public void dostep(int action,
double newdiscount,
double alphaFieldValue)
action - the action that the agent doesnewdiscount - the discount to usealphaFieldValue - the alpha value to usepublic void dostep(int action)
action - the action that the agent does
public double value(int xval,
int yval)
xval - the x-coordinateyval - the y-coordinate
public void doSteps(int count,
double greedyProb,
double newdiscount,
double alphaFieldValue)
count - the number of steps to dogreedyProb - the probability that is step is chosen greedilynewdiscount - the discount to usealphaFieldValue - the alpha value to use
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||