You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a POMDP defined by the tuple (S,A,O,T,Z,R,γ).
63
77
64
78
# Arguments
65
79
80
+
## Required
66
81
- `S`,`A`,`O`: State, action, and observation spaces (typically `Vector`s)
67
82
- `T::Function`: Transition probability distribution function; ``T(s,a,s')`` is the probability of transitioning to state ``s'`` from state ``s`` after taking action ``a``.
68
83
- `Z::Function`: Observation probability distribution function; ``O(a, s', o)`` is the probability of receiving observation ``o`` when state ``s'`` is reached after action ``a``.
69
84
- `R::Function`: Reward function; ``R(s,a)`` is the reward for taking action ``a`` in state ``s``.
70
85
- `γ::Float64`: Discount factor.
71
86
72
-
# Notes
73
-
- The default initial state distribution is uniform across all states. Changing this is not yet supported, but it can be overridden for simulations.
74
-
- Terminal states are not yet supported, but absorbing states with zero reward can be used.
87
+
## Optional
88
+
- `b₀=Uniform(S)`: Initial belief/state distribution (See `POMDPModelTools.Deterministic` and `POMDPModelTools.SparseCat` for other options).
89
+
90
+
## Keyword
91
+
- `terminal=Set()`: Set of terminal states. Once a terminal state is reached, no more actions can be taken or reward received.
75
92
"""
76
-
functionDiscreteExplicitPOMDP(s, a, o, t, z, r, discount)
93
+
functionDiscreteExplicitPOMDP(s, a, o, t, z, r, discount, b0=Uniform(s))
77
94
ss =vec(collect(s))
78
95
as =vec(collect(a))
79
96
os =vec(collect(o))
@@ -107,7 +124,7 @@ function DiscreteExplicitPOMDP(s, a, o, t, z, r, discount)
107
124
Dict(ss[i]=>i for i in1:length(ss)),
108
125
Dict(as[i]=>i for i in1:length(as)),
109
126
Dict(os[i]=>i for i in1:length(os)),
110
-
discount
127
+
discount, b0, terminal
111
128
)
112
129
113
130
probability_check(m)
@@ -116,22 +133,25 @@ function DiscreteExplicitPOMDP(s, a, o, t, z, r, discount)
116
133
end
117
134
118
135
"""
119
-
DiscreteExplicitMDP(S,A,T,R,γ)
136
+
DiscreteExplicitMDP(S,A,T,R,γ,[p₀])
120
137
121
138
Create an MDP defined by the tuple (S,A,T,R,γ).
122
139
123
140
# Arguments
124
141
142
+
## Required
125
143
- `S`,`A`: State and action spaces (typically `Vector`s)
126
144
- `T::Function`: Transition probability distribution function; ``T(s,a,s')`` is the probability of transitioning to state ``s'`` from state ``s`` after taking action ``a``.
127
145
- `R::Function`: Reward function; ``R(s,a)`` is the reward for taking action ``a`` in state ``s``.
128
146
- `γ::Float64`: Discount factor.
129
147
130
-
# Notes
131
-
- The default initial state distribution is uniform across all states. Changing this is not yet supported, but it can be overridden for simulations.
132
-
- Terminal states are not yet supported, but absorbing states with zero reward can be used.
148
+
## Optional
149
+
- `p₀=Uniform(S)`: Initial state distribution (See `POMDPModelTools.Deterministic` and `POMDPModelTools.SparseCat` for other options).
150
+
151
+
## Keyword
152
+
- `terminal=Set()`: Set of terminal states. Once a terminal state is reached, no more actions can be taken or reward received.
133
153
"""
134
-
functionDiscreteExplicitMDP(s, a, t, r, discount)
154
+
functionDiscreteExplicitMDP(s, a, t, r, discount, p0=Uniform(s); terminal=Set())
135
155
ss =vec(collect(s))
136
156
as =vec(collect(a))
137
157
@@ -141,7 +161,7 @@ function DiscreteExplicitMDP(s, a, t, r, discount)
0 commit comments