Packages

trait Policy[Obs, A, R, M[_], S[_]] extends AnyRef

This is how agents actually choose what comes next. This is a stochastic policy. We have to to be able to match this up with a state that has the same monadic return type, but for now it's hardcoded.

A - Action Obs - the observation offered by this state. R - reward M - the monadic type offered by the policy. S - the monad for the state.

Self Type
Policy[Obs, A, R, M, S]
Source
Policy.scala
Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Policy
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. type This = Policy[Obs, A, R, M, S]

Abstract Value Members

  1. abstract def choose(state: State[Obs, A, R, S]): M[A]

Concrete Value Members

  1. def contramapObservation[P](f: (P) ⇒ Obs)(implicit S: Functor[S]): Policy[P, A, R, M, S]
  2. def contramapReward[T](f: (T) ⇒ R)(implicit S: Functor[S]): Policy[Obs, A, T, M, S]
  3. def learn(sars: SARS[Obs, A, R, S]): This
  4. def mapK[N[_]](f: FunctionK[M, N]): Policy[Obs, A, R, N, S]

    Just an idea to see if I can make stochastic deciders out of deterministic deciders.

    Just an idea to see if I can make stochastic deciders out of deterministic deciders. We'll see how this develops.