|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.ObjectSarsaController
public class SarsaController
This applet demonstrates SARSA(lambda)-learning for a particular grid world problem. It isn't designed to be general or reusable.
Copyright (C) 2003-2006 David Poole.
This program gives SARSA(labmda)-learning code. The GUI is in SarsaGUI.java. The controller code is at SarseCotroller.java. It used the environment Q_Env.java.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
| Field Summary | |
|---|---|
double |
alpha
|
double |
discount
|
double[][][] |
e
The the eligibility trace for state (xpos,ypos) and action |
double |
greedyProb
|
double |
lambda
|
int |
prevAction
|
int |
prevX
|
int |
prevY
|
double[][][] |
qvalues
The Q values: Q[xpos,ypos,action] |
boolean |
replacingTrace
|
double |
reward
|
boolean |
tracing
|
| Method Summary | |
|---|---|
void |
doreset(double initVal)
resets the Q-values sets all of the Q-values to initVal, and the eligibility to 0 |
void |
dostep(int nextAction)
does one step carries out the action |
void |
dostep(int nextAction,
double newDiscount,
double newAlpha,
double newLambda)
does one step carries out the action, and sets the discount and the alpha value |
void |
doSteps(int count,
double newGreedyProb,
double newDiscount,
double newAlpha,
double newLambda)
does count number of steps whether each step is greedy or random is determine by greedyProb |
double |
value(int xval,
int yval)
determines the value of a location the value is the maximum, for all actions, of the q-value |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public double[][][] qvalues
public double[][][] e
public double discount
public double alpha
public double lambda
public double greedyProb
public boolean tracing
public int prevX
public int prevY
public int prevAction
public double reward
public boolean replacingTrace
| Method Detail |
|---|
public void doreset(double initVal)
initVal - the initial value to set all values to
public void dostep(int nextAction,
double newDiscount,
double newAlpha,
double newLambda)
nextAction - the next action that the agent doesnewDiscount - the discount to usenewAlpha - the new alpha value to usenewLambda - the new lambda value to usepublic void dostep(int nextAction)
nextAction - the next action that the agent does
public double value(int xval,
int yval)
xval - the x-coordinateyval - the y-coordinate
public void doSteps(int count,
double newGreedyProb,
double newDiscount,
double newAlpha,
double newLambda)
count - the number of steps to donewGreedyProb - the probability that is step is chosen greedilynewDiscount - the discount to usenewAlpha - the new alpha value to usenewLambda - the new lambda value to use
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||