case class Gradient[Obs, A, R, T, S[_]](config: Config[R, T], valueFn: ActionValueFn[Obs, A, Item[T]])(implicit evidence$1: Equiv[A], evidence$2: ToDouble[R], evidence$3: ToDouble[T]) extends Policy[Obs, A, R, Cat, S] with Product with Serializable
This thing needs to track its average reward internally... then, if we have the gradient baseline set, use that thing to generate the notes.
T is the "average" type.
- Source
- Gradient.scala
Linear Supertypes
Ordering
- Alphabetic
- By Inheritance
Inherited
- Gradient
- Serializable
- Serializable
- Product
- Equals
- Policy
- AnyRef
- Any
- Hide All
- Show All
Visibility
- Public
- All
Instance Constructors
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
aToDouble(obs: Obs): ToDouble[A]
Let's try out this style for a bit.
Let's try out this style for a bit. This gives us a way to convert an action directly into a probability, using our actionValue Map above.
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
- def choose(state: State[Obs, A, R, S]): Cat[A]
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
- val config: Config[R, T]
-
def
contramapObservation[P](f: (P) ⇒ Obs)(implicit S: Functor[S]): Policy[P, A, R, Cat, S]
- Definition Classes
- Policy
-
def
contramapReward[T](f: (T) ⇒ R)(implicit S: Functor[S]): Policy[Obs, A, T, Cat, S]
- Definition Classes
- Policy
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def learn(sars: SARS[Obs, A, R, S]): This
-
def
mapK[N[_]](f: FunctionK[Cat, N]): Policy[Obs, A, R, N, S]
Just an idea to see if I can make stochastic deciders out of deterministic deciders.
Just an idea to see if I can make stochastic deciders out of deterministic deciders. We'll see how this develops.
- Definition Classes
- Policy
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
- val valueFn: ActionValueFn[Obs, A, Item[T]]
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
edit this text on github
ScalaRL
This is the API documentation for the ScalaRL functional reinforcement learning library.
Further documentation for ScalaRL can be found at the documentation site.
Check out the ScalaRL package list for all the goods.