|
| 1 | +# Results |
| 2 | + |
| 3 | +Note: product telemetry feature was enabled but sending of product telemetry data was disabled. |
| 4 | + |
| 5 | +## NGINX OSS |
| 6 | + |
| 7 | +### Test environment |
| 8 | + |
| 9 | +NGINX Plus: false |
| 10 | + |
| 11 | +GKE Cluster: |
| 12 | + |
| 13 | +- Node count: 3 |
| 14 | +- k8s version: v1.27.8-gke.1067004 |
| 15 | +- vCPUs per node: 2 |
| 16 | +- RAM per node: 4022900Ki |
| 17 | +- Max pods per node: 110 |
| 18 | +- Zone: us-central1-c |
| 19 | +- Instance Type: e2-medium |
| 20 | + |
| 21 | +NGF pod name -- ngf-longevity-nginx-gateway-fabric-7f596f74c5-xzkzb |
| 22 | + |
| 23 | +### Traffic |
| 24 | + |
| 25 | +HTTP: |
| 26 | + |
| 27 | +```text |
| 28 | +Running 5760m test @ http://cafe.example.com/coffee |
| 29 | + 2 threads and 100 connections |
| 30 | + Thread Stats Avg Stdev Max +/- Stdev |
| 31 | + Latency 183.83ms 143.49ms 2.00s 79.14% |
| 32 | + Req/Sec 303.07 204.23 2.22k 66.90% |
| 33 | + 204934013 requests in 5760.00m, 71.25GB read |
| 34 | + Socket errors: connect 0, read 344459, write 0, timeout 5764 |
| 35 | +Requests/sec: 592.98 |
| 36 | +Transfer/sec: 216.19KB |
| 37 | +``` |
| 38 | + |
| 39 | +HTTPS: |
| 40 | + |
| 41 | +```text |
| 42 | +Running 5760m test @ https://cafe.example.com/tea |
| 43 | + 2 threads and 100 connections |
| 44 | + Thread Stats Avg Stdev Max +/- Stdev |
| 45 | + Latency 175.23ms 122.10ms 2.00s 68.72% |
| 46 | + Req/Sec 301.92 203.60 1.95k 66.97% |
| 47 | + 204120642 requests in 5760.00m, 69.83GB read |
| 48 | + Socket errors: connect 0, read 337203, write 0, timeout 246 |
| 49 | +Requests/sec: 590.63 |
| 50 | +Transfer/sec: 211.87KB |
| 51 | +``` |
| 52 | + |
| 53 | +### Logs |
| 54 | + |
| 55 | +No error logs in nginx-gateway |
| 56 | + |
| 57 | +No error logs in nginx |
| 58 | + |
| 59 | +### Key Metrics |
| 60 | + |
| 61 | +#### Containers memory |
| 62 | + |
| 63 | + |
| 64 | + |
| 65 | +Drop in NGINX memory usage corresponds to the end of traffic generation. |
| 66 | + |
| 67 | +#### NGF Container Memory |
| 68 | + |
| 69 | + |
| 70 | + |
| 71 | +### Containers CPU |
| 72 | + |
| 73 | + |
| 74 | + |
| 75 | +Drop in NGINX CPU usage corresponds to the end of traffic generation. |
| 76 | + |
| 77 | +### NGINX metrics |
| 78 | + |
| 79 | + |
| 80 | + |
| 81 | +Drop in request corresponds to the end of traffic generation. |
| 82 | + |
| 83 | + |
| 84 | +### Reloads |
| 85 | + |
| 86 | +Rate of reloads - successful and errors: |
| 87 | + |
| 88 | + |
| 89 | + |
| 90 | + |
| 91 | +Reload spikes correspond to 1 hour periods of backend re-rollouts. |
| 92 | +However, small spikes, correspond to periodic reconciliation of Secrets, which (incorrectly) |
| 93 | +triggers a reload -- https://github.com/nginxinc/nginx-gateway-fabric/issues/1112 |
| 94 | + |
| 95 | +No reloads finished with an error. |
| 96 | + |
| 97 | +Reload time distribution - counts: |
| 98 | + |
| 99 | + |
| 100 | + |
| 101 | + |
| 102 | +Reload related metrics at the end: |
| 103 | + |
| 104 | + |
| 105 | + |
| 106 | +All successful reloads took less than 5 seconds, with most under 1 second. |
| 107 | + |
| 108 | +## NGINX Plus |
| 109 | + |
| 110 | +### Test environment |
| 111 | + |
| 112 | +NGINX Plus: false |
| 113 | + |
| 114 | +GKE Cluster: |
| 115 | + |
| 116 | +- Node count: 3 |
| 117 | +- k8s version: v1.27.8-gke.1067004 |
| 118 | +- vCPUs per node: 2 |
| 119 | +- RAM per node: 4022900Ki |
| 120 | +- Max pods per node: 110 |
| 121 | +- Zone: us-central1-c |
| 122 | +- Instance Type: e2-medium |
| 123 | + |
| 124 | +NGF pod name -- ngf-longevity-nginx-gateway-fabric-fc7f6bcf-cnlww |
| 125 | + |
| 126 | +### Traffic |
| 127 | + |
| 128 | +HTTP: |
| 129 | + |
| 130 | +```text |
| 131 | +Running 5760m test @ http://cafe.example.com/coffee |
| 132 | + 2 threads and 100 connections |
| 133 | + Thread Stats Avg Stdev Max +/- Stdev |
| 134 | + Latency 173.03ms 120.83ms 2.00s 68.41% |
| 135 | + Req/Sec 313.29 209.75 2.11k 65.95% |
| 136 | + 211857930 requests in 5760.00m, 74.04GB read |
| 137 | + Socket errors: connect 0, read 307, write 0, timeout 118 |
| 138 | + Non-2xx or 3xx responses: 6 |
| 139 | +Requests/sec: 613.01 |
| 140 | +Transfer/sec: 224.63KB |
| 141 | +``` |
| 142 | + |
| 143 | +HTTPS: |
| 144 | + |
| 145 | +```text |
| 146 | +Running 5760m test @ https://cafe.example.com/tea |
| 147 | + 2 threads and 100 connections |
| 148 | + Thread Stats Avg Stdev Max +/- Stdev |
| 149 | + Latency 173.25ms 120.87ms 2.00s 68.37% |
| 150 | + Req/Sec 312.62 209.06 1.95k 66.02% |
| 151 | + 211427067 requests in 5760.00m, 72.76GB read |
| 152 | + Socket errors: connect 0, read 284, write 0, timeout 92 |
| 153 | + vresponses: 4 |
| 154 | +Requests/sec: 611.77 |
| 155 | +Transfer/sec: 220.77KB |
| 156 | +``` |
| 157 | + |
| 158 | +Note: Non-2xx or 3xx responses correspond to the error in NGINX log, see below. |
| 159 | + |
| 160 | +### Logs |
| 161 | + |
| 162 | +nginx-gateway: |
| 163 | + |
| 164 | +a lot of expected "usage reporting not enabled" errors. |
| 165 | + |
| 166 | +```text |
| 167 | +INFO 2024-03-20T14:13:00.372305088Z [resource.labels.containerName: nginx-gateway] {"level":"info", "msg":"Wait completed, proceeding to shutdown the manager", "ts":"2024-03-20T14:13:00Z"} |
| 168 | +ERROR 2024-03-20T14:13:00.374159128Z [resource.labels.containerName: nginx-gateway] {"error":"leader election lost", "level":"error", "msg":"error received after stop sequence was engaged", "stacktrace":"sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).engageStopProcedure.func1 sigs.k8s.io/[email protected]/pkg/manager/internal.go:490", "ts":"2024-03-20T14:13:00Z"} |
| 169 | +``` |
| 170 | + |
| 171 | +The error occurred during shutdown. Needs further investigation if the shutdown process should be fixed. |
| 172 | +https://github.com/nginxinc/nginx-gateway-fabric/issues/1735 |
| 173 | + |
| 174 | +nginx: |
| 175 | + |
| 176 | +```text |
| 177 | +ERROR 2024-03-17T21:11:11.017601264Z [resource.labels.containerName: nginx] 2024/03/17 21:11:10 [error] 43#43: *211045372 no live upstreams while connecting to upstream, client: 10.128.0.19, server: cafe.example.com, request: "GET /tea HTTP/1.1", upstream: "http://longevity_tea_80/tea", host: "cafe.example.com" |
| 178 | +``` |
| 179 | + |
| 180 | +10 errors like that occurred at different times. They occurred when backend pods were updated. Not clear why that happens. |
| 181 | +Because number of errors is small compared with total handled requests (211857930 + 211427067), no need to further |
| 182 | +investigate unless we see it in the future again at larger volume. |
| 183 | + |
| 184 | +### Key Metrics |
| 185 | + |
| 186 | +#### Containers memory |
| 187 | + |
| 188 | + |
| 189 | + |
| 190 | +Drop in NGINX memory usage corresponds to the end of traffic generation. |
| 191 | + |
| 192 | +#### NGF Container Memory |
| 193 | + |
| 194 | + |
| 195 | + |
| 196 | +### Containers CPU |
| 197 | + |
| 198 | + |
| 199 | + |
| 200 | +Drop in NGINX CPU usage corresponds to the end of traffic generation. |
| 201 | + |
| 202 | +### NGINX Plus metrics |
| 203 | + |
| 204 | + |
| 205 | + |
| 206 | +Drop in request corresponds to the end of traffic generation. |
| 207 | + |
| 208 | +### Reloads |
| 209 | + |
| 210 | +Rate of reloads - successful and errors: |
| 211 | + |
| 212 | + |
| 213 | + |
| 214 | +Note: compared to NGINX, we don't have as many reloads here, because NGF uses NGINX Plus API to reconfigure NGINX |
| 215 | +for endpoints changes. |
| 216 | + |
| 217 | +However, small spikes, correspond to periodic reconciliation of Secrets, which (incorrectly) |
| 218 | +triggers a reload -- https://github.com/nginxinc/nginx-gateway-fabric/issues/1112 |
| 219 | + |
| 220 | +No reloads finished with an error. |
| 221 | + |
| 222 | +Reload time distribution - counts: |
| 223 | + |
| 224 | + |
| 225 | + |
| 226 | +Reload related metrics at the end: |
| 227 | + |
| 228 | + |
| 229 | + |
| 230 | +All successful reloads took less than 1 seconds, with most under 0.5 second. |
| 231 | + |
| 232 | +## Comparison with previous runs |
| 233 | + |
| 234 | +If we compare with 1.1.0 results, we will see that NGF container memory usage is 2 times higher. |
| 235 | +That is probably due to a bug in metric visualization for 1.1.0 results (using mean instead of sum for aggregation). |
| 236 | +Running 1.1.0 in a similar cluster yields only slightly less memory usage (-2..-1MB) |
| 237 | + |
| 238 | + |
0 commit comments