|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object SarsaController
public class SarsaController
This applet demonstrates SARSA(lambda)-learning for a particular grid world problem. It isn't designed to be general or reusable.
Copyright (C) 2003-2006 David Poole.
This program gives SARSA(labmda)-learning code. The GUI is in SarsaGUI.java. The controller code is at SarseCotroller.java. It used the environment Q_Env.java.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
Field Summary | |
---|---|
double |
alpha
|
double |
discount
|
double[][][] |
e
The the eligibility trace for state (xpos,ypos) and action |
double |
greedyProb
|
double |
lambda
|
int |
prevAction
|
int |
prevX
|
int |
prevY
|
double[][][] |
qvalues
The Q values: Q[xpos,ypos,action] |
boolean |
replacingTrace
|
double |
reward
|
boolean |
tracing
|
Method Summary | |
---|---|
void |
doreset(double initVal)
resets the Q-values sets all of the Q-values to initVal, and the eligibility to 0 |
void |
dostep(int nextAction)
does one step carries out the action |
void |
dostep(int nextAction,
double newDiscount,
double newAlpha,
double newLambda)
does one step carries out the action, and sets the discount and the alpha value |
void |
doSteps(int count,
double newGreedyProb,
double newDiscount,
double newAlpha,
double newLambda)
does count number of steps whether each step is greedy or random is determine by greedyProb |
double |
value(int xval,
int yval)
determines the value of a location the value is the maximum, for all actions, of the q-value |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public double[][][] qvalues
public double[][][] e
public double discount
public double alpha
public double lambda
public double greedyProb
public boolean tracing
public int prevX
public int prevY
public int prevAction
public double reward
public boolean replacingTrace
Method Detail |
---|
public void doreset(double initVal)
initVal
- the initial value to set all values topublic void dostep(int nextAction, double newDiscount, double newAlpha, double newLambda)
nextAction
- the next action that the agent doesnewDiscount
- the discount to usenewAlpha
- the new alpha value to usenewLambda
- the new lambda value to usepublic void dostep(int nextAction)
nextAction
- the next action that the agent doespublic double value(int xval, int yval)
xval
- the x-coordinateyval
- the y-coordinate
public void doSteps(int count, double newGreedyProb, double newDiscount, double newAlpha, double newLambda)
count
- the number of steps to donewGreedyProb
- the probability that is step is chosen greedilynewDiscount
- the discount to usenewAlpha
- the new alpha value to usenewLambda
- the new lambda value to use
|
||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |