-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBootstrapping.html
708 lines (675 loc) · 760 KB
/
Bootstrapping.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="Fernando Miguez" />
<meta name="date" content="2020-04-23" />
<title>Bootstrap and Simulation for Nonlinear Models</title>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css" data-origin="pandoc">
a.sourceLine { display: inline-block; line-height: 1.25; }
a.sourceLine { pointer-events: none; color: inherit; text-decoration: inherit; }
a.sourceLine:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode { white-space: pre; position: relative; }
div.sourceCode { margin: 1em 0; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
code.sourceCode { white-space: pre-wrap; }
a.sourceLine { text-indent: -1em; padding-left: 1em; }
}
pre.numberSource a.sourceLine
{ position: relative; left: -4em; }
pre.numberSource a.sourceLine::before
{ content: attr(data-line-number);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; pointer-events: all; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
a.sourceLine::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>
<script>
// apply pandoc div.sourceCode style to pre.sourceCode instead
(function() {
var sheets = document.styleSheets;
for (var i = 0; i < sheets.length; i++) {
if (sheets[i].ownerNode.dataset["origin"] !== "pandoc") continue;
try { var rules = sheets[i].cssRules; } catch (e) { continue; }
for (var j = 0; j < rules.length; j++) {
var rule = rules[j];
// check if there is a div.sourceCode rule
if (rule.type !== rule.STYLE_RULE || rule.selectorText !== "div.sourceCode") continue;
var style = rule.style.cssText;
// check if color or background-color is set
if (rule.style.color === '' && rule.style.backgroundColor === '') continue;
// replace div.sourceCode by a pre.sourceCode rule
sheets[i].deleteRule(j);
sheets[i].insertRule('pre.sourceCode{' + style + '}', j);
}
}
})();
</script>
<style type="text/css">body {
background-color: #fff;
margin: 1em auto;
max-width: 700px;
overflow: visible;
padding-left: 2em;
padding-right: 2em;
font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
font-size: 14px;
line-height: 1.35;
}
#header {
text-align: center;
}
#TOC {
clear: both;
margin: 0 0 10px 10px;
padding: 4px;
width: 400px;
border: 1px solid #CCCCCC;
border-radius: 5px;
background-color: #f6f6f6;
font-size: 13px;
line-height: 1.3;
}
#TOC .toctitle {
font-weight: bold;
font-size: 15px;
margin-left: 5px;
}
#TOC ul {
padding-left: 40px;
margin-left: -1.5em;
margin-top: 5px;
margin-bottom: 5px;
}
#TOC ul ul {
margin-left: -2em;
}
#TOC li {
line-height: 16px;
}
table {
margin: 1em auto;
border-width: 1px;
border-color: #DDDDDD;
border-style: outset;
border-collapse: collapse;
}
table th {
border-width: 2px;
padding: 5px;
border-style: inset;
}
table td {
border-width: 1px;
border-style: inset;
line-height: 18px;
padding: 5px 5px;
}
table, table th, table td {
border-left-style: none;
border-right-style: none;
}
table thead, table tr.even {
background-color: #f7f7f7;
}
p {
margin: 0.5em 0;
}
blockquote {
background-color: #f6f6f6;
padding: 0.25em 0.75em;
}
hr {
border-style: solid;
border: none;
border-top: 1px solid #777;
margin: 28px 0;
}
dl {
margin-left: 0;
}
dl dd {
margin-bottom: 13px;
margin-left: 13px;
}
dl dt {
font-weight: bold;
}
ul {
margin-top: 0;
}
ul li {
list-style: circle outside;
}
ul ul {
margin-bottom: 0;
}
pre, code {
background-color: #f7f7f7;
border-radius: 3px;
color: #333;
white-space: pre-wrap;
}
pre {
border-radius: 3px;
margin: 5px 0px 10px 0px;
padding: 10px;
}
pre:not([class]) {
background-color: #f7f7f7;
}
code {
font-family: Consolas, Monaco, 'Courier New', monospace;
font-size: 85%;
}
p > code, li > code {
padding: 2px 0px;
}
div.figure {
text-align: center;
}
img {
background-color: #FFFFFF;
padding: 2px;
border: 1px solid #DDDDDD;
border-radius: 3px;
border: 1px solid #CCCCCC;
margin: 0 5px;
}
h1 {
margin-top: 0;
font-size: 35px;
line-height: 40px;
}
h2 {
border-bottom: 4px solid #f7f7f7;
padding-top: 10px;
padding-bottom: 2px;
font-size: 145%;
}
h3 {
border-bottom: 2px solid #f7f7f7;
padding-top: 10px;
font-size: 120%;
}
h4 {
border-bottom: 1px solid #f7f7f7;
margin-left: 8px;
font-size: 105%;
}
h5, h6 {
border-bottom: 1px solid #ccc;
font-size: 105%;
}
a {
color: #0033dd;
text-decoration: none;
}
a:hover {
color: #6666ff; }
a:visited {
color: #800080; }
a:visited:hover {
color: #BB00BB; }
a[href^="http:"] {
text-decoration: underline; }
a[href^="https:"] {
text-decoration: underline; }
code > span.kw { color: #555; font-weight: bold; }
code > span.dt { color: #902000; }
code > span.dv { color: #40a070; }
code > span.bn { color: #d14; }
code > span.fl { color: #d14; }
code > span.ch { color: #d14; }
code > span.st { color: #d14; }
code > span.co { color: #888888; font-style: italic; }
code > span.ot { color: #007020; }
code > span.al { color: #ff0000; font-weight: bold; }
code > span.fu { color: #900; font-weight: bold; } code > span.er { color: #a61717; background-color: #e3d2d2; }
</style>
</head>
<body>
<h1 class="title toc-ignore">Bootstrap and Simulation for Nonlinear Models</h1>
<h4 class="author">Fernando Miguez</h4>
<h4 class="date">2020-04-23</h4>
<p>Preliminaries: first, we need to load required libraries.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb1-1" data-line-number="1"><span class="kw">library</span>(ggplot2)</a>
<a class="sourceLine" id="cb1-2" data-line-number="2"><span class="kw">library</span>(nlraa)</a>
<a class="sourceLine" id="cb1-3" data-line-number="3"><span class="kw">library</span>(car)</a></code></pre></div>
<pre><code>## Loading required package: carData</code></pre>
<div class="sourceCode" id="cb3"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb3-1" data-line-number="1"><span class="kw">library</span>(nlme)</a></code></pre></div>
<div id="bootstrapping" class="section level1">
<h1>Bootstrapping</h1>
<p>The bootstrap is a technique in statistics which consists of resampling the observed data in order to create an empirical distribution of some statistic (Fox and Weisberg, 2012, 2017). This is often done to evaluate the assumptions of the model or to derive empirical distributions which are difficult to approximate. For nonlinear models, bootstrapping can be useful because often questions arise that a typical analysis does not answer. In many instances the distributions of parameters from a nonlinear model are in fact normal and standard approximations work well, in other instances, however, this is not the case.</p>
<p>In the following sections I will illustrate the use of the bootstrap for nonlinear model (nls), generalized nonlinear models (gnls) and nonlinear mixed models (nlme). In another section I will illustrate the ability to simulate from nonlinear (mixed) models</p>
<div id="nonlinear-models" class="section level2">
<h2>Nonlinear models</h2>
<p>The first type of models illustrated here are nonlinear models for which we make standard assumptions for error distributions and there are no random effects.</p>
<div id="simple-example" class="section level3">
<h3>Simple example</h3>
<p>As a simple example, we can use the dataset ‘barley’ in the ‘nlraa’ package. This represents the response of barley yield to different doses of fertilizer over several seasons. We will ignore the effect of ‘year’ in this section.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb4-1" data-line-number="1"><span class="kw">data</span>(barley, <span class="dt">package =</span> <span class="st">"nlraa"</span>)</a>
<a class="sourceLine" id="cb4-2" data-line-number="2"></a>
<a class="sourceLine" id="cb4-3" data-line-number="3"><span class="co">## Quick visualization</span></a>
<a class="sourceLine" id="cb4-4" data-line-number="4"><span class="kw">ggplot</span>(<span class="dt">data =</span> barley, <span class="kw">aes</span>(<span class="dt">x =</span> NF, <span class="dt">y =</span> yield)) <span class="op">+</span><span class="st"> </span><span class="kw">geom_point</span>()</a></code></pre></div>
<p><img src="" /><!-- --></p>
<p>We can fit a very simple model known as the linear-plateau.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb5-1" data-line-number="1"><span class="co">## Linear-plateau model</span></a>
<a class="sourceLine" id="cb5-2" data-line-number="2"><span class="co">## The function SSlinp is in the 'nlraa' package</span></a>
<a class="sourceLine" id="cb5-3" data-line-number="3">fit.lp <-<span class="st"> </span><span class="kw">nls</span>(yield <span class="op">~</span><span class="st"> </span><span class="kw">SSlinp</span>(NF, a, b, xs), <span class="dt">data =</span> barley)</a>
<a class="sourceLine" id="cb5-4" data-line-number="4"></a>
<a class="sourceLine" id="cb5-5" data-line-number="5"><span class="co">## Visualize data and fit</span></a>
<a class="sourceLine" id="cb5-6" data-line-number="6"><span class="kw">ggplot</span>(<span class="dt">data =</span> barley, <span class="kw">aes</span>(<span class="dt">x =</span> NF, <span class="dt">y =</span> yield)) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb5-7" data-line-number="7"><span class="st"> </span><span class="kw">geom_point</span>() <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb5-8" data-line-number="8"><span class="st"> </span><span class="kw">geom_line</span>(<span class="kw">aes</span>(<span class="dt">y =</span> <span class="kw">fitted</span>(fit.lp)))</a></code></pre></div>
<p><img src="" /><!-- --></p>
<p>At this point several questions might arise:</p>
<ol style="list-style-type: decimal">
<li>What are the confidence intervals for the model parameters?</li>
<li>How do we describe the uncertainty around the fitted values of the model?</li>
<li>What is the estimate and confidence interval for the asymptote?</li>
</ol>
</div>
<div id="confidence-interval-for-model-parameters" class="section level3">
<h3>Confidence interval for model parameters</h3>
<p>The confidence intervals for the model parameters can be derived using a method called profiling. The generic function ‘confint’ can be used which invokes ‘confint.nls’ from the ‘MASS’ package.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb6-1" data-line-number="1"><span class="kw">confint</span>(fit.lp)</a></code></pre></div>
<pre><code>## Waiting for profiling to be done...</code></pre>
<pre><code>## 2.5% 97.5%
## a 104.821725 167.73660
## b 19.937411 38.00267
## xs 5.849474 12.23099</code></pre>
<p>For nonlinear models this method is preferred to simple confidence intervals that would directly use the standard error of the model parameters. In many cases, there is good agreement between the two, but this is not always the case and it can be informative to compare both methods. Those standard errors can be obtained using the ‘summary’ function, along with hypothesis testing.</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb9-1" data-line-number="1"><span class="kw">summary</span>(fit.lp)</a></code></pre></div>
<pre><code>##
## Formula: yield ~ SSlinp(NF, a, b, xs)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 137.067 16.179 8.472 1.83e-12 ***
## b 27.501 5.269 5.219 1.62e-06 ***
## xs 7.682 1.211 6.342 1.68e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 70.79 on 73 degrees of freedom
##
## Number of iterations to convergence: 2
## Achieved convergence tolerance: 6.548e-16</code></pre>
<p>Some interesting plots can be produced which illustrate the symmetry (or lack thereof) of the distribution for the model parameters. Profile plots which show large departures from normality would indicate that the normal approximation for the confidence intervals might not be the best choice.</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb11-1" data-line-number="1"><span class="co">## For the intercept</span></a>
<a class="sourceLine" id="cb11-2" data-line-number="2"><span class="kw">plot</span>(<span class="kw">profile</span>(fit.lp, <span class="st">"a"</span>))</a></code></pre></div>
<p><img src="" /><!-- --></p>
<div class="sourceCode" id="cb12"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb12-1" data-line-number="1"><span class="co">## This one is fairly symetric and the normal approximation is reasonable</span></a>
<a class="sourceLine" id="cb12-2" data-line-number="2"><span class="kw">plot</span>(<span class="kw">profile</span>(fit.lp, <span class="st">"b"</span>))</a></code></pre></div>
<p><img src="" /><!-- --></p>
<div class="sourceCode" id="cb13"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb13-1" data-line-number="1"><span class="kw">plot</span>(<span class="kw">profile</span>(fit.lp, <span class="st">"xs"</span>))</a></code></pre></div>
<p><img src="" /><!-- --></p>
<div class="sourceCode" id="cb14"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb14-1" data-line-number="1"><span class="co">## These last parameters are less symetrical</span></a></code></pre></div>
<p>Being able to see the whole profile for a parameter can be interesting in terms of understanding its behavior. For example, in this model, which has a ‘break-point’, we would not expect all the parameters to have identical shaped profiles and that is illustrated above for parameter ‘xs’.</p>
<p>In this case, bootstraping is an alternative to computing the confidence intervals and it can be used as a way to double-check the previous results, but it can also be used when profiling fails. A function which can perform bootstraping for nonlinear models is ‘Boot’ in the ‘car’ pacakge.</p>
<div class="sourceCode" id="cb15"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb15-1" data-line-number="1">fit.lp.Bt <-<span class="st"> </span><span class="kw">Boot</span>(fit.lp)</a></code></pre></div>
<pre><code>## Loading required namespace: boot</code></pre>
<pre><code>##
## Number of bootstraps was 814 out of 999 attempted</code></pre>
<p>In this case, this function uses the base ‘boot’ package under the hood and it returns an object of that class. This also allows for other options such as defining a function as a combination of model parameters, and the use of more than one core to speed up computation. In this case, since we are also interested in the asymptote, which is ‘a + b * xs’, we can obtain confidence intervals for this parameter by doing the following.</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb18-1" data-line-number="1">fn <-<span class="st"> </span><span class="cf">function</span>(x) <span class="kw">coef</span>(x)[<span class="dv">1</span>] <span class="op">+</span><span class="st"> </span><span class="kw">coef</span>(x)[<span class="dv">2</span>] <span class="op">*</span><span class="st"> </span><span class="kw">coef</span>(x)[<span class="dv">3</span>]</a>
<a class="sourceLine" id="cb18-2" data-line-number="2">fit.lp.Bt.asymp <-<span class="st"> </span><span class="kw">Boot</span>(fit.lp, <span class="dt">f =</span> fn, <span class="dt">labels =</span> <span class="st">"asymptote"</span>)</a></code></pre></div>
<pre><code>##
## Number of bootstraps was 813 out of 999 attempted</code></pre>
<div class="sourceCode" id="cb20"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb20-1" data-line-number="1"><span class="kw">confint</span>(fit.lp.Bt.asymp)</a></code></pre></div>
<pre><code>## Bootstrap bca confidence intervals
##
## 2.5 % 97.5 %
## asymptote 324.0106 380.8581</code></pre>
<div class="sourceCode" id="cb22"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb22-1" data-line-number="1"><span class="kw">hist</span>(fit.lp.Bt.asymp)</a></code></pre></div>
<p><img src="" /><!-- --></p>
<p>The bootstrap method takes a few seconds here, but it can be computationally much more demanding in larger problems since it re-fits the model many, many times. An alternative is to use the delta method, for which there is a function in the ‘car’ pacakge. The delta method has the advantage of being fast and the disadvantage that it makes the assumption of a normal distribution. In this case the lower bound for the confidence interval is similar to the one obtained with the bootstrap, but the upper bound for the confidence interval is somewhat higher using the bootstrap (381 vs. 370). Since the bootstrap method makes fewer assumptions it is probably the better one to report.</p>
<div class="sourceCode" id="cb23"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb23-1" data-line-number="1">fit.lp.Dlt.asymp <-<span class="st"> </span><span class="kw">deltaMethod</span>(fit.lp, <span class="st">"a + b * xs"</span>)</a>
<a class="sourceLine" id="cb23-2" data-line-number="2">fit.lp.Dlt.asymp</a></code></pre></div>
<pre><code>## Estimate SE 2.5 % 97.5 %
## a + b * xs 348.326 11.484 325.817 370.84</code></pre>
<p>In order to answer our second question above related to the uncertainty around the fitted values we could plug-in the values from the bootstrap sampling and obtain different regression lines.</p>
<div class="sourceCode" id="cb25"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb25-1" data-line-number="1"><span class="co">## The object 't' in the bootstrap run has </span></a>
<a class="sourceLine" id="cb25-2" data-line-number="2"><span class="co">## the parameter estimate values</span></a>
<a class="sourceLine" id="cb25-3" data-line-number="3"><span class="co">## First remove missing values</span></a>
<a class="sourceLine" id="cb25-4" data-line-number="4">fit.lp.Bt.prms <-<span class="st"> </span><span class="kw">na.omit</span>(fit.lp.Bt<span class="op">$</span>t)</a>
<a class="sourceLine" id="cb25-5" data-line-number="5"></a>
<a class="sourceLine" id="cb25-6" data-line-number="6">nrb <-<span class="st"> </span><span class="kw">length</span>(<span class="kw">unique</span>(barley<span class="op">$</span>NF))</a>
<a class="sourceLine" id="cb25-7" data-line-number="7">nrp <-<span class="st"> </span><span class="kw">nrow</span>(fit.lp.Bt.prms)</a>
<a class="sourceLine" id="cb25-8" data-line-number="8"></a>
<a class="sourceLine" id="cb25-9" data-line-number="9"><span class="co">## Set up an empty data.frame </span></a>
<a class="sourceLine" id="cb25-10" data-line-number="10">prd.dat <-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">i =</span> <span class="kw">as.factor</span>(<span class="kw">rep</span>(<span class="dv">1</span><span class="op">:</span>nrp, <span class="dt">each =</span> nrb)), <span class="dt">NF =</span> <span class="kw">rep</span>(<span class="kw">unique</span>(barley<span class="op">$</span>NF), nrp), <span class="dt">prd =</span> <span class="ot">NA</span>)</a>
<a class="sourceLine" id="cb25-11" data-line-number="11"></a>
<a class="sourceLine" id="cb25-12" data-line-number="12"><span class="co">## A simple loop can be used to run the model multiple times</span></a>
<a class="sourceLine" id="cb25-13" data-line-number="13"><span class="cf">for</span>(i <span class="cf">in</span> <span class="dv">1</span><span class="op">:</span>nrp){</a>
<a class="sourceLine" id="cb25-14" data-line-number="14"> a.i <-<span class="st"> </span>fit.lp.Bt.prms[i,<span class="dv">1</span>]</a>
<a class="sourceLine" id="cb25-15" data-line-number="15"> b.i <-<span class="st"> </span>fit.lp.Bt.prms[i,<span class="dv">2</span>]</a>
<a class="sourceLine" id="cb25-16" data-line-number="16"> xs.i <-<span class="st"> </span>fit.lp.Bt.prms[i,<span class="dv">3</span>]</a>
<a class="sourceLine" id="cb25-17" data-line-number="17"> </a>
<a class="sourceLine" id="cb25-18" data-line-number="18"> prd.dat[<span class="kw">c</span>(<span class="dv">1</span> <span class="op">+</span><span class="st"> </span>(nrb<span class="op">*</span>(i <span class="op">-</span><span class="st"> </span><span class="dv">1</span>)))<span class="op">:</span><span class="kw">c</span>(i <span class="op">*</span><span class="st"> </span>nrb),<span class="dv">3</span>] <-<span class="st"> </span><span class="kw">linp</span>(<span class="kw">unique</span>(barley<span class="op">$</span>NF), a.i, b.i, xs.i)</a>
<a class="sourceLine" id="cb25-19" data-line-number="19">}</a>
<a class="sourceLine" id="cb25-20" data-line-number="20"></a>
<a class="sourceLine" id="cb25-21" data-line-number="21"><span class="co">## Plot the data with the original fit and the uncertainty</span></a>
<a class="sourceLine" id="cb25-22" data-line-number="22"><span class="kw">ggplot</span>() <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb25-23" data-line-number="23"><span class="st"> </span><span class="kw">geom_line</span>(<span class="dt">data =</span> prd.dat, <span class="kw">aes</span>(<span class="dt">x =</span> NF, <span class="dt">y =</span> prd, <span class="dt">group =</span> i), </a>
<a class="sourceLine" id="cb25-24" data-line-number="24"> <span class="dt">color =</span> <span class="st">"gray"</span>, <span class="dt">alpha =</span> <span class="fl">0.2</span>) <span class="op">+</span></a>
<a class="sourceLine" id="cb25-25" data-line-number="25"><span class="st"> </span><span class="kw">geom_line</span>(<span class="dt">data =</span> barley, <span class="kw">aes</span>(<span class="dt">x =</span> NF, <span class="dt">y =</span> <span class="kw">fitted</span>(fit.lp))) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb25-26" data-line-number="26"><span class="st"> </span><span class="kw">geom_point</span>(<span class="dt">data =</span> barley, <span class="kw">aes</span>(<span class="dt">x =</span> NF, <span class="dt">y =</span> yield)) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb25-27" data-line-number="27"><span class="st"> </span><span class="kw">ylab</span>(<span class="st">"Yield"</span>) <span class="op">+</span><span class="st"> </span><span class="kw">xlab</span>(<span class="st">"Nitrogen rate"</span>) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb25-28" data-line-number="28"><span class="st"> </span><span class="kw">ggtitle</span>(<span class="st">"Using results from Boot </span><span class="ch">\n</span><span class="st"> and plug-in into linp"</span>)</a></code></pre></div>
<p><img src="" /><!-- --></p>
<p>The previous graph shows the black line with the original fit and the variability in the fitted values due to the resampling generated during the bootstrap process. The function ‘predict.nls’ at the moment ignores the arguments ‘se.fit’ and ‘interval’ which means that this functionality is not available in the base ‘stats’ package. (There is an alternative approach in the ‘propagate’ package - see references.)</p>
<p>So we could use the quantiles of ‘prd.dat’ object above to derive confidence intervals for the regression line. The previous example is an attempt to make it clear how the uncertainty could be displayed. Equivalently, we can use ‘Boot’ for this purpose, but with some effort manipulating the data.</p>
<div class="sourceCode" id="cb26"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb26-1" data-line-number="1">fn2 <-<span class="st"> </span><span class="cf">function</span>(x) <span class="kw">predict</span>(x, <span class="dt">newdata =</span> <span class="kw">data.frame</span>(<span class="dt">NF =</span> <span class="dv">0</span><span class="op">:</span><span class="dv">14</span>))</a>
<a class="sourceLine" id="cb26-2" data-line-number="2">fit.lp.Bt2 <-<span class="st"> </span><span class="kw">Boot</span>(fit.lp, fn2)</a></code></pre></div>
<pre><code>##
## Number of bootstraps was 820 out of 999 attempted</code></pre>
<div class="sourceCode" id="cb28"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb28-1" data-line-number="1">fttd <-<span class="st"> </span><span class="kw">na.omit</span>(fit.lp.Bt2<span class="op">$</span>t)</a>
<a class="sourceLine" id="cb28-2" data-line-number="2">prds <-<span class="st"> </span><span class="kw">c</span>(<span class="kw">t</span>(fttd))</a>
<a class="sourceLine" id="cb28-3" data-line-number="3">ndat <-<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">i =</span> <span class="kw">as.factor</span>(<span class="kw">rep</span>(<span class="dv">1</span><span class="op">:</span><span class="kw">nrow</span>(fttd), <span class="dt">each =</span> <span class="kw">ncol</span>(fttd))),</a>
<a class="sourceLine" id="cb28-4" data-line-number="4"> <span class="dt">NF =</span> <span class="kw">rep</span>(<span class="dv">0</span><span class="op">:</span><span class="dv">14</span>, <span class="kw">nrow</span>(fttd)))</a>
<a class="sourceLine" id="cb28-5" data-line-number="5">ndat<span class="op">$</span>prd <-<span class="st"> </span>prds</a>
<a class="sourceLine" id="cb28-6" data-line-number="6"></a>
<a class="sourceLine" id="cb28-7" data-line-number="7"><span class="co">## Essentially the same graph as the one above</span></a>
<a class="sourceLine" id="cb28-8" data-line-number="8"><span class="kw">ggplot</span>() <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb28-9" data-line-number="9"><span class="st"> </span><span class="kw">geom_line</span>(<span class="dt">data =</span> ndat, <span class="kw">aes</span>(<span class="dt">x =</span> NF, <span class="dt">y =</span> prd, <span class="dt">group =</span> i), </a>
<a class="sourceLine" id="cb28-10" data-line-number="10"> <span class="dt">color =</span> <span class="st">"gray"</span>, <span class="dt">alpha =</span> <span class="fl">0.2</span>) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb28-11" data-line-number="11"><span class="st"> </span><span class="kw">geom_line</span>(<span class="dt">data =</span> barley, <span class="kw">aes</span>(<span class="dt">x =</span> NF, <span class="dt">y =</span> <span class="kw">fitted</span>(fit.lp))) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb28-12" data-line-number="12"><span class="st"> </span><span class="kw">geom_point</span>(<span class="dt">data =</span> barley, <span class="kw">aes</span>(<span class="dt">x =</span> NF, <span class="dt">y =</span> yield)) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb28-13" data-line-number="13"><span class="st"> </span><span class="kw">ylab</span>(<span class="st">"Yield"</span>) <span class="op">+</span><span class="st"> </span><span class="kw">xlab</span>(<span class="st">"Nitrogen rate"</span>)</a></code></pre></div>
<p><img src="" /><!-- --></p>
</div>
</div>
<div id="bootstrapping-generalized-nonlinear-models" class="section level2">
<h2>Bootstrapping generalized nonlinear models</h2>
<p>Implementing bootstrap for more complex models takes extra work. For this, I’m taking the approach of sampling from the vector of fixed parameters and also bootstrapping the standardized residuals for ‘gnls’ and ‘nlme’ objects. (This methodology can be improved in the future.) This takes advantage that we assume that the residuals are normally distributed.</p>
<p>As a first example, we can compare the bootstrapped confidence intervals with the ones obtained by ‘intervals’. Note: I’m running this only a few hundred times in the examples below for efficiency, but it is a good practice to run these a few thousand times.</p>
<div class="sourceCode" id="cb29"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb29-1" data-line-number="1"><span class="kw">set.seed</span>(<span class="dv">101</span>)</a>
<a class="sourceLine" id="cb29-2" data-line-number="2"><span class="co">## Simplify the dataset to make the set up simpler</span></a>
<a class="sourceLine" id="cb29-3" data-line-number="3">barley2 <-<span class="st"> </span><span class="kw">subset</span>(barley, year <span class="op"><</span><span class="st"> </span><span class="dv">1974</span>)</a>
<a class="sourceLine" id="cb29-4" data-line-number="4"></a>
<a class="sourceLine" id="cb29-5" data-line-number="5">fit.lp.gnls2 <-<span class="st"> </span><span class="kw">gnls</span>(yield <span class="op">~</span><span class="st"> </span><span class="kw">SSlinp</span>(NF, a, b, xs), <span class="dt">data =</span> barley2)</a>
<a class="sourceLine" id="cb29-6" data-line-number="6"></a>
<a class="sourceLine" id="cb29-7" data-line-number="7"><span class="kw">intervals</span>(fit.lp.gnls2)</a></code></pre></div>
<pre><code>## Approximate 95% confidence intervals
##
## Coefficients:
## lower est. upper
## a 138.79981 176.800000 214.800186
## b 13.75248 27.603093 41.453706
## xs 3.64251 6.174127 8.705744
## attr(,"label")
## [1] "Coefficients:"
##
## Residual standard error:
## lower est. upper
## 25.50341 35.17935 56.67543</code></pre>
<div class="sourceCode" id="cb31"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb31-1" data-line-number="1"><span class="co">## Compare this to the bootstrapping approach</span></a>
<a class="sourceLine" id="cb31-2" data-line-number="2">fit.lp.gnls2.bt <-<span class="st"> </span><span class="kw">boot_nlme</span>(fit.lp.gnls2, <span class="dt">R =</span> <span class="dv">200</span>)</a></code></pre></div>
<pre><code>## Number of times model fit did not converge 22 out of 200</code></pre>
<div class="sourceCode" id="cb33"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb33-1" data-line-number="1"><span class="kw">summary</span>(fit.lp.gnls2.bt)</a></code></pre></div>
<pre><code>##
## Number of bootstrap replications R = 178
## original bootBias bootSE bootMed
## 1 176.8000 -1.881489 23.1895 174.7183
## 2 27.6031 1.095069 7.4412 28.4382
## 3 6.1741 -0.072778 1.3834 5.9537</code></pre>
<div class="sourceCode" id="cb35"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb35-1" data-line-number="1"><span class="kw">confint</span>(fit.lp.gnls2.bt, <span class="dt">type =</span> <span class="st">"perc"</span>)</a></code></pre></div>
<pre><code>## Bootstrap percent confidence intervals
##
## 2.5 % 97.5 %
## 1 125.320034 224.479385
## 2 14.440280 42.522212
## 3 4.040724 9.591933</code></pre>
<p>The confidence intervals obtained by bootstrap are wider (as expected) than the ones obtained using intervals because they consider the uncertainty in the parameters of the nonlinear model. The next example shows a slightly more complex model in which we introduce the (fixed) effect of year for a subset of the data.</p>
<div class="sourceCode" id="cb37"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb37-1" data-line-number="1"><span class="kw">set.seed</span>(<span class="dv">101</span>)</a>
<a class="sourceLine" id="cb37-2" data-line-number="2">barley2<span class="op">$</span>year.f <-<span class="st"> </span><span class="kw">as.factor</span>(barley2<span class="op">$</span>year)</a>
<a class="sourceLine" id="cb37-3" data-line-number="3"></a>
<a class="sourceLine" id="cb37-4" data-line-number="4">cfs <-<span class="st"> </span><span class="kw">coef</span>(fit.lp.gnls2)</a>
<a class="sourceLine" id="cb37-5" data-line-number="5"></a>
<a class="sourceLine" id="cb37-6" data-line-number="6">fit.lp.gnls3 <-<span class="st"> </span><span class="kw">update</span>(fit.lp.gnls2, </a>
<a class="sourceLine" id="cb37-7" data-line-number="7"> <span class="dt">params =</span> <span class="kw">list</span>(a <span class="op">+</span><span class="st"> </span>b <span class="op">+</span><span class="st"> </span>xs <span class="op">~</span><span class="st"> </span>year.f),</a>
<a class="sourceLine" id="cb37-8" data-line-number="8"> <span class="dt">start =</span> <span class="kw">c</span>(cfs[<span class="dv">1</span>], <span class="dv">0</span>, <span class="dv">0</span>, <span class="dv">0</span>, </a>
<a class="sourceLine" id="cb37-9" data-line-number="9"> cfs[<span class="dv">2</span>], <span class="dv">0</span>, <span class="dv">0</span>, <span class="dv">0</span>,</a>
<a class="sourceLine" id="cb37-10" data-line-number="10"> cfs[<span class="dv">3</span>], <span class="dv">0</span>, <span class="dv">0</span>, <span class="dv">0</span>))</a>
<a class="sourceLine" id="cb37-11" data-line-number="11"></a>
<a class="sourceLine" id="cb37-12" data-line-number="12"><span class="co">## This bootstraps the vector of parameters</span></a>
<a class="sourceLine" id="cb37-13" data-line-number="13">fit.lp.gnls3.bt <-<span class="st"> </span><span class="kw">boot_nlme</span>(fit.lp.gnls3, <span class="dt">R =</span> <span class="dv">300</span>)</a></code></pre></div>
<pre><code>## Number of times model fit did not converge 152 out of 300</code></pre>
<div class="sourceCode" id="cb39"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb39-1" data-line-number="1"><span class="kw">confint</span>(fit.lp.gnls3.bt, <span class="dt">type =</span> <span class="st">"perc"</span>)</a></code></pre></div>
<pre><code>## Bootstrap percent confidence intervals
##
## 2.5 % 97.5 %
## 1 111.1375192 196.119143
## 2 -45.3383112 86.644389
## 3 -52.9212287 65.101962
## 4 -0.5059473 114.993794
## 5 14.3750448 41.574802
## 6 -20.9463834 15.697041
## 7 -17.7796470 16.955886
## 8 -18.1083851 16.965825
## 9 4.4735481 9.074686
## 10 -2.6628309 4.960814
## 11 -4.1799869 1.682101
## 12 -2.8113050 3.862596</code></pre>
<div class="sourceCode" id="cb41"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb41-1" data-line-number="1"><span class="kw">hist</span>(fit.lp.gnls3.bt, <span class="dv">1</span>, <span class="dt">ci =</span> <span class="st">"perc"</span>)</a></code></pre></div>
<p><img src="" /><!-- --></p>
<p>This is simply to illustrate the use of bootstrapping for a ‘gnls’ object, which is something that function ‘car::Boot’ does not seem to be able to handle (the deltaMethod function also fails for this type of model, because it cannot handle the names in the vector of coefficients which uses periods).</p>
</div>
<div id="bootstrapping-nonlinear-mixed-models" class="section level2">
<h2>Bootstrapping nonlinear mixed models</h2>
<p>For illustration, I will continue to use the barley example, but this time the model is fitted to each year individually and then a nonlinear mixed model which assumes a diagonal matrix for the random effects (for simplicity). In this case we want an esimtate of the confidence interval for the asymptote which is not an explicit parameter but rather a combination ‘a + b * xs’ of the three parameters.</p>
<div class="sourceCode" id="cb42"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb42-1" data-line-number="1"><span class="kw">set.seed</span>(<span class="dv">101</span>)</a>
<a class="sourceLine" id="cb42-2" data-line-number="2">barley<span class="op">$</span>year.f <-<span class="st"> </span><span class="kw">as.factor</span>(barley<span class="op">$</span>year)</a>
<a class="sourceLine" id="cb42-3" data-line-number="3"></a>
<a class="sourceLine" id="cb42-4" data-line-number="4">barleyG <-<span class="st"> </span><span class="kw">groupedData</span>(yield <span class="op">~</span><span class="st"> </span>NF <span class="op">|</span><span class="st"> </span>year.f, <span class="dt">data =</span> barley)</a>
<a class="sourceLine" id="cb42-5" data-line-number="5"></a>
<a class="sourceLine" id="cb42-6" data-line-number="6">fitL.bar <-<span class="st"> </span><span class="kw">nlsList</span>(yield <span class="op">~</span><span class="st"> </span><span class="kw">SSlinp</span>(NF, a, b, xs), <span class="dt">data =</span> barleyG)</a>
<a class="sourceLine" id="cb42-7" data-line-number="7"></a>
<a class="sourceLine" id="cb42-8" data-line-number="8">fit.bar.nlme <-<span class="st"> </span><span class="kw">nlme</span>(fitL.bar, <span class="dt">random =</span> <span class="kw">pdDiag</span>(a <span class="op">+</span><span class="st"> </span>b <span class="op">+</span><span class="st"> </span>xs <span class="op">~</span><span class="st"> </span><span class="dv">1</span>))</a>
<a class="sourceLine" id="cb42-9" data-line-number="9"></a>
<a class="sourceLine" id="cb42-10" data-line-number="10"><span class="co">## Confidence intervals of the model fixed parameters</span></a>
<a class="sourceLine" id="cb42-11" data-line-number="11"><span class="kw">intervals</span>(fit.bar.nlme, <span class="dt">which =</span> <span class="st">"fixed"</span>)</a></code></pre></div>
<pre><code>## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## a 109.212647 134.663569 160.11449
## b 23.970656 27.939486 31.90832
## xs 6.823868 7.671139 8.51841
## attr(,"label")
## [1] "Fixed effects:"</code></pre>
<div class="sourceCode" id="cb44"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb44-1" data-line-number="1"><span class="co">## Function which computes the asymptote</span></a>
<a class="sourceLine" id="cb44-2" data-line-number="2">fna <-<span class="st"> </span><span class="cf">function</span>(x) <span class="kw">fixef</span>(x)[<span class="dv">1</span>] <span class="op">+</span><span class="st"> </span><span class="kw">fixef</span>(x)[<span class="dv">2</span>] <span class="op">*</span><span class="st"> </span><span class="kw">fixef</span>(x)[<span class="dv">3</span>]</a>
<a class="sourceLine" id="cb44-3" data-line-number="3"></a>
<a class="sourceLine" id="cb44-4" data-line-number="4"><span class="co">## Bootstrap the model for the asymptote</span></a>
<a class="sourceLine" id="cb44-5" data-line-number="5">fit.bar.nlme.bt <-<span class="st"> </span><span class="kw">boot_nlme</span>(fit.bar.nlme, <span class="dt">f =</span> fna, <span class="dt">R =</span> <span class="dv">200</span>)</a></code></pre></div>
<pre><code>## Number of times model fit did not converge 90 out of 200</code></pre>
<div class="sourceCode" id="cb46"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb46-1" data-line-number="1"><span class="kw">confint</span>(fit.bar.nlme.bt, <span class="dt">type =</span> <span class="st">"perc"</span>)</a></code></pre></div>
<pre><code>## Bootstrap percent confidence intervals
##
## 2.5 % 97.5 %
## 1 306.707 382.6985</code></pre>
<div class="sourceCode" id="cb48"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb48-1" data-line-number="1"><span class="kw">hist</span>(fit.bar.nlme.bt, <span class="dt">ci =</span> <span class="st">"perc"</span>)</a></code></pre></div>
<p><img src="" /><!-- --></p>
<p>For this model the estimate for the asymptote is 349, but from the model fit we do not get a direct confidence interval. Bootstrapping makes this easy (but it takes a bit of time) and it results in 307, 383.</p>
</div>
<div id="confidence-bands-for-generalized-nonlinear-models" class="section level2">
<h2>Confidence bands for generalized nonlinear models</h2>
<p>For this example I will use the Orange dataset. The goal here is to place confidence bands around the mean response function.</p>
<div class="sourceCode" id="cb49"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb49-1" data-line-number="1"><span class="kw">data</span>(Orange)</a>
<a class="sourceLine" id="cb49-2" data-line-number="2"></a>
<a class="sourceLine" id="cb49-3" data-line-number="3"><span class="co">## This fits a model which considers the fact that </span></a>
<a class="sourceLine" id="cb49-4" data-line-number="4"><span class="co">## the variance typically increases as the fitted</span></a>
<a class="sourceLine" id="cb49-5" data-line-number="5"><span class="co">## values increase</span></a>
<a class="sourceLine" id="cb49-6" data-line-number="6">fitg <-<span class="st"> </span><span class="kw">gnls</span>(circumference <span class="op">~</span><span class="st"> </span><span class="kw">SSlogis</span>(age, Asym, xmid, scal), </a>
<a class="sourceLine" id="cb49-7" data-line-number="7"> <span class="dt">data =</span> Orange, <span class="dt">weights =</span> <span class="kw">varPower</span>())</a>
<a class="sourceLine" id="cb49-8" data-line-number="8"></a>
<a class="sourceLine" id="cb49-9" data-line-number="9"><span class="co">## Here we use bootstrapping to investigate </span></a>
<a class="sourceLine" id="cb49-10" data-line-number="10"><span class="co">## the uncertainty around the fitted values</span></a>
<a class="sourceLine" id="cb49-11" data-line-number="11">fitg.bt1 <-<span class="st"> </span><span class="kw">boot_nlme</span>(fitg, fitted, <span class="dt">psim =</span> <span class="dv">1</span>, <span class="dt">R =</span> <span class="dv">300</span>)</a></code></pre></div>
<pre><code>## Number of times model fit did not converge 3 out of 300</code></pre>
<div class="sourceCode" id="cb51"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb51-1" data-line-number="1"><span class="co">## Compute 90% quantile confidence bands</span></a>
<a class="sourceLine" id="cb51-2" data-line-number="2">lwr1.q <-<span class="st"> </span><span class="kw">apply</span>(<span class="kw">t</span>(fitg.bt1<span class="op">$</span>t), <span class="dv">1</span>, quantile, <span class="dt">probs =</span> <span class="fl">0.05</span>, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)</a>
<a class="sourceLine" id="cb51-3" data-line-number="3">upr1.q <-<span class="st"> </span><span class="kw">apply</span>(<span class="kw">t</span>(fitg.bt1<span class="op">$</span>t), <span class="dv">1</span>, quantile, <span class="dt">probs =</span> <span class="fl">0.95</span>, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>)</a>
<a class="sourceLine" id="cb51-4" data-line-number="4"></a>
<a class="sourceLine" id="cb51-5" data-line-number="5"><span class="kw">ggplot</span>() <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb51-6" data-line-number="6"><span class="st"> </span><span class="kw">geom_point</span>(<span class="dt">data =</span> Orange, <span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> circumference)) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb51-7" data-line-number="7"><span class="st"> </span><span class="kw">geom_line</span>(<span class="dt">data =</span> Orange, <span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> <span class="kw">fitted</span>(fitg))) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb51-8" data-line-number="8"><span class="st"> </span><span class="kw">geom_ribbon</span>(<span class="kw">aes</span>(<span class="dt">x =</span> Orange<span class="op">$</span>age, <span class="dt">ymin =</span> lwr1.q, <span class="dt">ymax =</span> upr1.q), </a>
<a class="sourceLine" id="cb51-9" data-line-number="9"> <span class="dt">fill =</span> <span class="st">"purple"</span>, <span class="dt">alpha =</span> <span class="fl">0.2</span>) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb51-10" data-line-number="10"><span class="st"> </span><span class="kw">ggtitle</span>(<span class="st">"Orange fit using the logistic: </span><span class="ch">\n</span><span class="st"> 90% confidence band for the mean function"</span>)</a></code></pre></div>
<p><img src="" /><!-- --></p>
</div>
</div>
<div id="simulation" class="section level1">
<h1>Simulation</h1>
<p>Here I will continue with the Orange dataset and illustrate different types of simulation which are relevant depending on what the inference scope is. This simulation is, in fact, used inside of the bootstrap function above; it is a necessary step for bootstrap for these type of models.</p>
<div id="fitted-values" class="section level2">
<h2>Fitted values</h2>
<p>The first level of predictions we can generate for the Orange dataset are at the ‘population’ level. These are similar to the fitted values for the ‘gnls’ case above. Notice that we do not have the option of requesting standard errors for the predicitons for this model either.</p>
<div class="sourceCode" id="cb52"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb52-1" data-line-number="1">fmoL <-<span class="st"> </span><span class="kw">nlsList</span>(circumference <span class="op">~</span><span class="st"> </span><span class="kw">SSlogis</span>(age, Asym, xmid, scal), <span class="dt">data =</span> Orange)</a>
<a class="sourceLine" id="cb52-2" data-line-number="2"></a>
<a class="sourceLine" id="cb52-3" data-line-number="3">fmo <-<span class="st"> </span><span class="kw">nlme</span>(fmoL, <span class="dt">random =</span> <span class="kw">pdDiag</span>(Asym <span class="op">+</span><span class="st"> </span>xmid <span class="op">+</span><span class="st"> </span>scal <span class="op">~</span><span class="st"> </span><span class="dv">1</span>))</a>
<a class="sourceLine" id="cb52-4" data-line-number="4"></a>
<a class="sourceLine" id="cb52-5" data-line-number="5"><span class="co">## Just one simulation, because with psim = 0 and level = 0, we are </span></a>
<a class="sourceLine" id="cb52-6" data-line-number="6"><span class="co">## computing the predicted values at level = 0 (?predict.nlme)</span></a>
<a class="sourceLine" id="cb52-7" data-line-number="7">sim00 <-<span class="st"> </span><span class="kw">simulate_nlme</span>(fmo, <span class="dt">nsim =</span> <span class="dv">1</span>, <span class="dt">psim =</span> <span class="dv">0</span>, <span class="dt">level =</span> <span class="dv">0</span>)</a>
<a class="sourceLine" id="cb52-8" data-line-number="8"></a>
<a class="sourceLine" id="cb52-9" data-line-number="9">dat00 <-<span class="st"> </span><span class="kw">cbind</span>(Orange, <span class="dt">prd =</span> <span class="kw">as.vector</span>(sim00))</a>
<a class="sourceLine" id="cb52-10" data-line-number="10"></a>
<a class="sourceLine" id="cb52-11" data-line-number="11"><span class="kw">ggplot</span>(<span class="dt">data =</span> dat00) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb52-12" data-line-number="12"><span class="st"> </span><span class="kw">geom_point</span>(<span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> circumference)) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb52-13" data-line-number="13"><span class="st"> </span><span class="kw">geom_line</span>(<span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> prd)) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb52-14" data-line-number="14"><span class="st"> </span><span class="kw">ggtitle</span>(<span class="st">"psim = 0, level = 0"</span>)</a></code></pre></div>
<p><img src="" /><!-- --></p>
<p>One approach we can use to assess the uncertainty in the response would be to sample from the vector of parameters and the associated variance-covariance matrix. In this case, we could do many realizations, but I’ll keep it to just 100. A frequentist interpretation of the bands below would be that in repeated sampling from this same population we would expect that the true relationship will be within this band with a specified probability. One way to compute these bands (which I’m not doing below), would be to calculate the quantiles of the empirical distribution. These could be called confidence bands, which would be equivalent to the ‘interval = confidence’ in a linear model (?predict.lm).</p>
<div class="sourceCode" id="cb53"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb53-1" data-line-number="1">sdat10 <-<span class="st"> </span><span class="kw">simulate_nlme</span>(fmo, <span class="dt">nsim =</span> <span class="dv">100</span>, <span class="dt">psim =</span> <span class="dv">1</span>, <span class="dt">level =</span> <span class="dv">0</span>, <span class="dt">value =</span> <span class="st">"data.frame"</span>)</a>
<a class="sourceLine" id="cb53-2" data-line-number="2"></a>
<a class="sourceLine" id="cb53-3" data-line-number="3"><span class="kw">ggplot</span>(<span class="dt">data =</span> sdat10) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb53-4" data-line-number="4"><span class="st"> </span><span class="kw">geom_line</span>(<span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> y.sim, <span class="dt">group =</span> ii), <span class="dt">color =</span> <span class="st">"gray"</span>, <span class="dt">alpha =</span> <span class="fl">0.5</span>) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb53-5" data-line-number="5"><span class="st"> </span><span class="kw">geom_point</span>(<span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> circumference)) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb53-6" data-line-number="6"><span class="st"> </span><span class="kw">ggtitle</span>(<span class="st">"psim = 1, level = 0"</span>) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb53-7" data-line-number="7"><span class="st"> </span><span class="kw">ylab</span>(<span class="st">"circumference"</span>)</a></code></pre></div>
<p><img src="" /><!-- --></p>
<p>In this case, each tree is an ‘individual’ and the previous one is a population level confidence. Next, we would like to produce individual level ‘prediction’ bands. This is, if we were able to resample from a population with a similar structure and if we wanted to have bands that include the true relationship between circumference and age for these trees, we could produce the confidence bands from the quantiles of the empirical distribution of the lines in the figure below (we would need a few thousand samples in order to do this).</p>
<div class="sourceCode" id="cb54"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb54-1" data-line-number="1">sdat11 <-<span class="st"> </span><span class="kw">simulate_nlme</span>(fmo, <span class="dt">nsim =</span> <span class="dv">100</span>, <span class="dt">psim =</span> <span class="dv">1</span>, <span class="dt">level =</span> <span class="dv">1</span>, <span class="dt">value =</span> <span class="st">"data.frame"</span>)</a>
<a class="sourceLine" id="cb54-2" data-line-number="2"></a>
<a class="sourceLine" id="cb54-3" data-line-number="3"><span class="co">## We need a tree simulation ID</span></a>
<a class="sourceLine" id="cb54-4" data-line-number="4"><span class="co">## for plotting</span></a>
<a class="sourceLine" id="cb54-5" data-line-number="5">sdat11<span class="op">$</span>Tree_ID <-<span class="st"> </span><span class="kw">with</span>(sdat11, <span class="kw">paste0</span>(Tree,<span class="st">"_"</span>,ii))</a>
<a class="sourceLine" id="cb54-6" data-line-number="6"></a>
<a class="sourceLine" id="cb54-7" data-line-number="7"><span class="kw">ggplot</span>(<span class="dt">data =</span> sdat11) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb54-8" data-line-number="8"><span class="st"> </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>Tree) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb54-9" data-line-number="9"><span class="st"> </span><span class="kw">geom_line</span>(<span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> y.sim, <span class="dt">color =</span> Tree, <span class="dt">group =</span> Tree_ID), </a>
<a class="sourceLine" id="cb54-10" data-line-number="10"> <span class="dt">alpha =</span> <span class="fl">0.5</span>) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb54-11" data-line-number="11"><span class="st"> </span><span class="kw">geom_point</span>(<span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> circumference)) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb54-12" data-line-number="12"><span class="st"> </span><span class="kw">ggtitle</span>(<span class="st">"psim = 1, level = 1"</span>) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb54-13" data-line-number="13"><span class="st"> </span><span class="kw">ylab</span>(<span class="st">"circumference"</span>) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb54-14" data-line-number="14"><span class="st"> </span><span class="kw">theme</span>(<span class="dt">legend.position =</span> <span class="st">"none"</span>)</a></code></pre></div>
<p><img src="" /><!-- --></p>
<p>Finally, the next level would be to produce intervals that would likely contain a future observation instead of just the relationship, which was captured in the figure above. The ‘bands’ below are wider than in the previous figure. The reason why the difference is small, is, because in this case the residual variance is comparatively small.</p>
<div class="sourceCode" id="cb55"><pre class="sourceCode r"><code class="sourceCode r"><a class="sourceLine" id="cb55-1" data-line-number="1">sdat21 <-<span class="st"> </span><span class="kw">simulate_nlme</span>(fmo, <span class="dt">nsim =</span> <span class="dv">100</span>, <span class="dt">psim =</span> <span class="dv">2</span>, <span class="dt">level =</span> <span class="dv">1</span>, <span class="dt">value =</span> <span class="st">"data.frame"</span>)</a>
<a class="sourceLine" id="cb55-2" data-line-number="2"></a>
<a class="sourceLine" id="cb55-3" data-line-number="3"><span class="co">## Here I'm plotting points to emphasize that we are making</span></a>
<a class="sourceLine" id="cb55-4" data-line-number="4"><span class="co">## predictions at the level of a single observation</span></a>
<a class="sourceLine" id="cb55-5" data-line-number="5"><span class="kw">ggplot</span>(<span class="dt">data =</span> sdat21) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb55-6" data-line-number="6"><span class="st"> </span><span class="kw">facet_wrap</span>(<span class="op">~</span><span class="st"> </span>Tree) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb55-7" data-line-number="7"><span class="st"> </span><span class="kw">geom_point</span>(<span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> y.sim, <span class="dt">color =</span> Tree), </a>
<a class="sourceLine" id="cb55-8" data-line-number="8"> <span class="dt">alpha =</span> <span class="fl">0.5</span>) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb55-9" data-line-number="9"><span class="st"> </span><span class="kw">geom_point</span>(<span class="kw">aes</span>(<span class="dt">x =</span> age, <span class="dt">y =</span> circumference)) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb55-10" data-line-number="10"><span class="st"> </span><span class="kw">ggtitle</span>(<span class="st">"psim = 2, level = 1"</span>) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb55-11" data-line-number="11"><span class="st"> </span><span class="kw">ylab</span>(<span class="st">"circumference"</span>) <span class="op">+</span><span class="st"> </span></a>
<a class="sourceLine" id="cb55-12" data-line-number="12"><span class="st"> </span><span class="kw">theme</span>(<span class="dt">legend.position =</span> <span class="st">"none"</span>)</a></code></pre></div>
<p><img src="" /><!-- --></p>
<p>The previous examples are simulations, but in order to generate confidence intervals or bands we would need to perform bootstrap, but it takes much longer, so I’m not including it in this vignette.</p>
</div>
</div>
<div id="appendix" class="section level1">
<h1>Appendix</h1>
<div id="note-on-simulate-methodology" class="section level2">
<h2>Note on simulate methodology</h2>
<p>In the case of simulating data from a nonlinear model and how this relates to bootstrapping. I’ve been thinking about a number of things. In a nonlinear model, the mean function should account for a <em>substantial</em> part of the variability. If we think about models in terms of</p>
<p><span class="math display">\[
data = signal + noise
\]</span> And for mixed models: <span class="math display">\[
data = fixed + random + residual
\]</span></p>
<p>I can see how in the linear case often the random part can be of substantial interest and sometimes more important than the fixed part, but for nonlinear models, the <em>fixed</em> part should dominate, otherwise we would need a better function or the data are not amenable to nonlinear modeling. In simulate_nlme, psim = 0, gives you the fitted values, so should be equivalent to ‘predict.nlme’, if theres is ever a discrepancy, then that is a bug. For psim = 1, I sample from the vector of parameters assuming that they are normally distributed. This seems to be very reasonable in practice and, in fact, much more important is the variance-covariance matrix of the vector of parameters and in particular the correlations. The problem is that in nonlinear mixed models, the relationship among parameters is not linear and can look a bit like a ‘banana’. This can be investigated through the use of bootstrap. For psim = 2, I add the residual variability which considers the modeling of the variance (heterogeneous variances), but not the possible lack of independence of the residuals. (This seems to be unlikely to arise for the type of models that we usually fit. Adding the correlation structure of the residuals can be done, but I’ll work on it when the need arises - not a priority.) It would be possible to add an options for psim = 3, where I also consider the uncertainty in the ‘random’ part above, this is is better captures, in my opinion, the ‘level’ argument in the ‘simulate_nlme’ function. By default, ‘predict.nlme’ predicts at the deepest level in the hierarchy (i.e. Q) and changing this argument allows for simulations at the different levels. Another reason, for not implementing psim = 3 (meaning sampling for the random part) is that, often, there is meaning or interest in these specific random terms. For example, sites/locations/subjects/experimenta.units are higher or lower because, in part, some characteristics that are know and some unknown, but which are out of our control. For this reason, assigning resampling or assigning a deviation at random at this level might be undesirable. If we have 10 sites labeled “A” through “J” and we know that site “A” is higher due to some characteristics (which we do not control), it would not make sense to assign the deviation from site “A” to “J” (which might be the lowest). In our analysis it might still make sense to treat ‘sites’ as random. In this case, when we use simulate_nlme, it makes more sense to use ‘psim = 2, level = 1’ (‘site’ would be our deepest level in the hierarchy - like ‘Tree’ in the ‘Orange’ example above) to generate data which appears similar to the data which was actually collected. It is true that this does not allow us to incorporate the uncertainty from the fact that we obtained these specific random deviates at the level of site (or Tree), but these will vary anyway because we are simulating forward from the vector of fixed parameters, which should account for the majority of the variability in these models. For this reason, I know that my approach is not perfect, but, so far, it seems to be usable. I’m thinking about Morris argument (see References) and I tested whether there is bias in the estimate of the random term variances using my method and I did not detect a bias - but I’m not resampling from the BLUPs. I’m also thinking that the non-parameteric part of my method (resampling the residuals) is somewhat trivial and implementing a fully parameteric option is feasible.</p>
</div>
</div>
<div id="references" class="section level1">
<h1>References</h1>
<ul>
<li><p>J. Fox and S. Weisberg. An R Companion to Applied Regression. Sage, Thousand Oaks CA, 2nd edition, 2011. URL <a href="http://z.umn.edu/carbook" class="uri">http://z.umn.edu/carbook</a>.</p></li>
<li><p>J. Fox and S. Weisberg. Bootstrapping regression models in R. Technical report, 2017. URL: <a href="https://socialsciences.mcmaster.ca/jfox/Books/Companion-2E/appendix/Appendix-Bootstrapping.pdf" class="uri">https://socialsciences.mcmaster.ca/jfox/Books/Companion-2E/appendix/Appendix-Bootstrapping.pdf</a></p></li>
<li><p>J. Fox and S. Weisberg. Nonlinear Regression, Nonlinear Least Squares, and Nonlinear Mixed Models in R. Appendix, 2018. URL: <a href="https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/Appendix-Nonlinear-Regression.pdf" class="uri">https://socialsciences.mcmaster.ca/jfox/Books/Companion/appendices/Appendix-Nonlinear-Regression.pdf</a></p></li>
<li><p>Morris, J. S. (2002). The BLUPs Are Not ‘best’ When It Comes to Bootstrapping. Statistics & Probability Letters 56(4): 425–430. <a href="https://doi.org/10.1016/S0167-7152(02)00041-X" class="uri">https://doi.org/10.1016/S0167-7152(02)00041-X</a>.</p></li>
<li><p><a href="http://sia.webpopix.org/nonlinearRegression.html#confidence-intervals-and-prediction-intervals" class="uri">http://sia.webpopix.org/nonlinearRegression.html#confidence-intervals-and-prediction-intervals</a></p></li>
<li><p>propagate: <a href="https://rmazing.wordpress.com/2013/08/31/introducing-propagate/" class="uri">https://rmazing.wordpress.com/2013/08/31/introducing-propagate/</a></p></li>
<li><p>For linear mixed models: package ‘lmeresampler’: <a href="https://CRAN.R-project.org/package=lmeresampler" class="uri">https://CRAN.R-project.org/package=lmeresampler</a></p></li>
<li><p><a href="https://stats.stackexchange.com/questions/231074/confidence-intervals-on-predictions-for-a-non-linear-mixed-model-nlme" class="uri">https://stats.stackexchange.com/questions/231074/confidence-intervals-on-predictions-for-a-non-linear-mixed-model-nlme</a></p></li>
<li><p><a href="https://rpubs.com/bbolker/3423" class="uri">https://rpubs.com/bbolker/3423</a></p></li>
</ul>
</div>
<!-- code folding -->
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
var script = document.createElement("script");
script.type = "text/javascript";
script.src = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
document.getElementsByTagName("head")[0].appendChild(script);
})();
</script>
</body>
</html>