You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a POMDP defined by the tuple (S,A,O,T,Z,R,γ).
63
61
64
62
# Arguments
65
63
64
+
## Required
66
65
- `S`,`A`,`O`: State, action, and observation spaces (typically `Vector`s)
67
66
- `T::Function`: Transition probability distribution function; ``T(s,a,s')`` is the probability of transitioning to state ``s'`` from state ``s`` after taking action ``a``.
68
67
- `Z::Function`: Observation probability distribution function; ``O(a, s', o)`` is the probability of receiving observation ``o`` when state ``s'`` is reached after action ``a``.
69
68
- `R::Function`: Reward function; ``R(s,a)`` is the reward for taking action ``a`` in state ``s``.
70
69
- `γ::Float64`: Discount factor.
71
70
72
-
# Notes
73
-
- The default initial state distribution is uniform across all states. Changing this is not yet supported, but it can be overridden for simulations.
74
-
- Terminal states are not yet supported, but absorbing states with zero reward can be used.
71
+
## Optional
72
+
- `b₀=Uniform(S)`: Initial belief/state distribution (See `POMDPModelTools.Deterministic` and `POMDPModelTools.SparseCat` for other options).
73
+
74
+
## Keyword
75
+
- `terminals=Set()`: Set of terminal states. Once a terminal state is reached, no more actions can be taken or reward received.
75
76
"""
76
-
functionDiscreteExplicitPOMDP(s, a, o, t, z, r, discount)
77
+
functionDiscreteExplicitPOMDP(s, a, o, t, z, r, discount, b0=Uniform(s); terminals=Set())
77
78
ss =vec(collect(s))
78
79
as =vec(collect(a))
79
80
os =vec(collect(o))
@@ -107,7 +108,7 @@ function DiscreteExplicitPOMDP(s, a, o, t, z, r, discount)
107
108
Dict(ss[i]=>i for i in1:length(ss)),
108
109
Dict(as[i]=>i for i in1:length(as)),
109
110
Dict(os[i]=>i for i in1:length(os)),
110
-
discount
111
+
discount, b0, convert(Set{eltype(ss)}, terminals)
111
112
)
112
113
113
114
probability_check(m)
@@ -116,22 +117,25 @@ function DiscreteExplicitPOMDP(s, a, o, t, z, r, discount)
116
117
end
117
118
118
119
"""
119
-
DiscreteExplicitMDP(S,A,T,R,γ)
120
+
DiscreteExplicitMDP(S,A,T,R,γ,[p₀])
120
121
121
122
Create an MDP defined by the tuple (S,A,T,R,γ).
122
123
123
124
# Arguments
124
125
126
+
## Required
125
127
- `S`,`A`: State and action spaces (typically `Vector`s)
126
128
- `T::Function`: Transition probability distribution function; ``T(s,a,s')`` is the probability of transitioning to state ``s'`` from state ``s`` after taking action ``a``.
127
129
- `R::Function`: Reward function; ``R(s,a)`` is the reward for taking action ``a`` in state ``s``.
128
130
- `γ::Float64`: Discount factor.
129
131
130
-
# Notes
131
-
- The default initial state distribution is uniform across all states. Changing this is not yet supported, but it can be overridden for simulations.
132
-
- Terminal states are not yet supported, but absorbing states with zero reward can be used.
132
+
## Optional
133
+
- `p₀=Uniform(S)`: Initial state distribution (See `POMDPModelTools.Deterministic` and `POMDPModelTools.SparseCat` for other options).
134
+
135
+
## Keyword
136
+
- `terminals=Set()`: Set of terminal states. Once a terminal state is reached, no more actions can be taken or reward received.
133
137
"""
134
-
functionDiscreteExplicitMDP(s, a, t, r, discount)
138
+
functionDiscreteExplicitMDP(s, a, t, r, discount, p0=Uniform(s); terminals=Set())
135
139
ss =vec(collect(s))
136
140
as =vec(collect(a))
137
141
@@ -141,7 +145,7 @@ function DiscreteExplicitMDP(s, a, t, r, discount)
0 commit comments