trait Policy[Obs, A, R, M[_], S[_]] extends AnyRef
This is how agents actually choose what comes next. This is a stochastic policy. We have to to be able to match this up with a state that has the same monadic return type, but for now it's hardcoded.
A - Action Obs - the observation offered by this state. R - reward M - the monadic type offered by the policy. S - the monad for the state.
- Self Type
- Policy[Obs, A, R, M, S]
- Source
- Policy.scala
Linear Supertypes
Known Subclasses
Ordering
- Alphabetic
- By Inheritance
Inherited
- Policy
- AnyRef
- Any
- Hide All
- Show All
Visibility
- Public
- All
Concrete Value Members
- def contramapObservation[P](f: (P) ⇒ Obs)(implicit S: Functor[S]): Policy[P, A, R, M, S]
- def contramapReward[T](f: (T) ⇒ R)(implicit S: Functor[S]): Policy[Obs, A, T, M, S]
- def learn(sars: SARS[Obs, A, R, S]): This
-
def
mapK[N[_]](f: FunctionK[M, N]): Policy[Obs, A, R, N, S]
Just an idea to see if I can make stochastic deciders out of deterministic deciders.
Just an idea to see if I can make stochastic deciders out of deterministic deciders. We'll see how this develops.
edit this text on github
ScalaRL
This is the API documentation for the ScalaRL functional reinforcement learning library.
Further documentation for ScalaRL can be found at the documentation site.
Check out the ScalaRL package list for all the goods.