Inquiry Regarding Model Initialization #232
Replies: 2 comments 2 replies
-
@Steven-Lau-lib : these are good questions. From the point of view of the likelihood, i.e., the model as an explanation of the data, a key quantity is the distribution of the latent state at the time, t1, of the first observation. In the paper you cite, the idea of putting the initial time, t0, at 1870 was to enforce the simplifying assumption that the latent state at time t1 was a sample from a certain distribution. In particular, we assumed that the 70 years from t=t0 to t=t1 were sufficient for transients associated with the initial conditions at t=t0 to die out, making the latent state at t=t1 effectively a draw from the stationary distribution of the latent-state process. Of course, this process was itself shaped by simplifying assumptions about the demography of the pre-vaccine period. All of these assumptions can only affect the data analysis through their effects on the shape of the latent-state distribution at t=t1. In particular, if one had some other means of sampling directly from this distribution, one could do so and the model would be identical. In effect, the shape of this distribution is just another model assumption and alternative assumptions are certainly plausible. Thus your question, in essence, is where and to what extent the shape of this distribution affects the conclusions one draws from the data analysis. In general, the answer is that it must affect the results to some extent. Our reasoning was that such an effect would be likely small, if for no other reason than that the distribution at t=t1 becomes progressively less important with time. However, one can and, in some absolute sense, should investigate this question. In particular, as you suggest, with good information about demography in the pre-vaccine period, one could explore how better-informed assumptions about the dynamics in that period would change the shape of the t=t1 latent-state distribution. Whether one should in fact perform this analysis depends, of course, on its priority relative to all the other questions one wants to answer. Alternatively, one could choose to parameterize the distribution explicitly. This might introduce additional parameters, complicating the comparison somewhat, but as a scientific hypothesis, it would be every bit as interesting as the other and the data would, presumably, have their say as to which one of the hypotheses gives a better explanation. As for your general question regarding the initialization of the latent state in partially-observed Markov process models, one has to parameterize the initial-state distribution somehow or other. In the paper you've referenced, we effectively draw the initial state from a probability distribution (obtained by performing a long simulation). As I say, we could have assumed a distribution of a different form. Indeed, as a limiting case of the latter, one can parameterize the latent-state distribution by its values; this makes the latent-state distribution deterministic conditional on the parameters. I hope that you find these musings helpful. As I say, these are questions of general interest and I am happy to elaborate on these points if you have further questions. I'll also suggest that you contact the corresponding author, Dr. Matthieu Domenech de Cellès (ORCID profile here) to get his thoughts on these issues, as well as a more detailed perspective on that particular paper. |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for your reply. |
Beta Was this translation helpful? Give feedback.
-
Dear Professor,
I am currently studying your previous work on modeling the waning of natural and vaccine-derived immunity to pertussis, as presented in your GitHub repository (https://github.com/kingaa/massachusetts-honeymoon) and associated publication (https://doi.org/10.1126/scitranslmed.aaj1748). I find your research highly insightful and valuable for understanding long-term immunity dynamics.
I have a question regarding the model initialization described in the supplementary materials. It states:
“To initialize these simulations, we ran particles from 1870 to the first data point (in January 1990) and calculated their log-likelihood at that point.” As I understand it, this means the simulations in
pomp
are initialized by running the model from as early as 1870.Given this, I would like to ask if this means that I need to provide accurate covariate data (e.g., birth rate, vaccination coverage) for the entire period from 1870 to 1990. This can be quite challenging, as historical data from that era may be incomplete or unreliable. For example, since widespread vaccination only began around 1950, should we provide accurate coverage for the earlier years in the simulation?
In addition, I am also looking for references or resources that provide a more systematic discussion on how to initialize compartmental models like this. In many papers, this step is only briefly mentioned, and I would greatly appreciate any suggestions you might have on where to learn more about best practices for setting initial conditions in such models.
I realize these questions are more related to model design rather than the use of
pomp
itself, but they have been on my mind, and I would be truly grateful for any advice or references you could share.Thank you very much for your time and attention.
Beta Was this translation helpful? Give feedback.
All reactions