|
| 1 | +# Testing |
| 2 | + |
| 3 | +This is an example of how to do some microbenchmarking. |
| 4 | + |
| 5 | +1. Collect the existing metrics from the agents |
| 6 | + |
| 7 | +Example [deployment with prometheus](./monitoring.yaml) |
| 8 | + |
| 9 | +2. Deploy some Pods running an http server behind a Service |
| 10 | + |
| 11 | +Since network policies work for the first packet in the connection we need to generate new connections: |
| 12 | +* We can not use HTTP keepalives or HTTP2 or protocols that multiplex request over the same connection |
| 13 | +* A pair of endpoints will be limited by the number of ephemeral ports in the origin, since the destination IP and Port will be fixed |
| 14 | + |
| 15 | +``` |
| 16 | +cat /proc/sys/net/ipv4/ip_local_port_range |
| 17 | +32768 60999 |
| 18 | +``` |
| 19 | + |
| 20 | +3. Run a [Job that polls the Service created previously](job_poller.yaml) |
| 21 | + |
| 22 | +Each Pod runs request in parallel |
| 23 | + |
| 24 | +``` |
| 25 | + kubectl logs abtest-t7wjd |
| 26 | +This is ApacheBench, Version 2.3 <$Revision: 1913912 $> |
| 27 | +Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ |
| 28 | +Licensed to The Apache Software Foundation, http://www.apache.org/ |
| 29 | +
|
| 30 | +Benchmarking test-service (be patient) |
| 31 | +Completed 1000 requests |
| 32 | +Completed 2000 requests |
| 33 | +Completed 3000 requests |
| 34 | +Completed 4000 requests |
| 35 | +Completed 5000 requests |
| 36 | +Completed 6000 requests |
| 37 | +Completed 7000 requests |
| 38 | +Completed 8000 requests |
| 39 | +Completed 9000 requests |
| 40 | +Completed 10000 requests |
| 41 | +Finished 10000 requests |
| 42 | +
|
| 43 | +
|
| 44 | +Server Software: |
| 45 | +Server Hostname: test-service |
| 46 | +Server Port: 80 |
| 47 | +
|
| 48 | +Document Path: / |
| 49 | +Document Length: 60 bytes |
| 50 | +
|
| 51 | +Concurrency Level: 1000 |
| 52 | +Time taken for tests: 4.317 seconds |
| 53 | +Complete requests: 10000 |
| 54 | +Failed requests: 1274 |
| 55 | + (Connect: 0, Receive: 0, Length: 1274, Exceptions: 0) |
| 56 | +Total transferred: 1768597 bytes |
| 57 | +HTML transferred: 598597 bytes |
| 58 | +Requests per second: 2316.61 [#/sec] (mean) |
| 59 | +Time per request: 431.666 [ms] (mean) |
| 60 | +Time per request: 0.432 [ms] (mean, across all concurrent requests) |
| 61 | +Transfer rate: 400.11 [Kbytes/sec] received |
| 62 | +
|
| 63 | +Connection Times (ms) |
| 64 | + min mean[+/-sd] median max |
| 65 | +Connect: 0 188 571.9 4 4121 |
| 66 | +Processing: 0 2 5.3 0 42 |
| 67 | +Waiting: 0 1 2.8 0 32 |
| 68 | +Total: 0 190 571.8 5 4122 |
| 69 | +
|
| 70 | +Percentage of the requests served within a certain time (ms) |
| 71 | + 50% 5 |
| 72 | + 66% 7 |
| 73 | + 75% 22 |
| 74 | + 80% 24 |
| 75 | + 90% 1023 |
| 76 | + 95% 1046 |
| 77 | + 98% 2063 |
| 78 | + 99% 3080 |
| 79 | + 100% 4122 (longest request) |
| 80 | + ``` |
| 81 | + |
| 82 | + You have to tune your system as it is most likely you reach limits in some of the different resources, specially in the conntrack table |
| 83 | + |
| 84 | + ``` |
| 85 | + [1825525.815672] net_ratelimit: 411 callbacks suppressed |
| 86 | +[1825525.815676] nf_conntrack: nf_conntrack: table full, dropping packet |
| 87 | +[1825525.827617] nf_conntrack: nf_conntrack: table full, dropping packet |
| 88 | +[1825525.834317] nf_conntrack: nf_conntrack: table full, dropping packet |
| 89 | +[1825525.841058] nf_conntrack: nf_conntrack: table full, dropping packet |
| 90 | +[1825525.847764] nf_conntrack: nf_conntrack: table full, dropping packet |
| 91 | +[1825525.854458] nf_conntrack: nf_conntrack: table full, dropping packet |
| 92 | +[1825525.861131] nf_conntrack: nf_conntrack: table full, dropping packet |
| 93 | +[1825525.867814] nf_conntrack: nf_conntrack: table full, dropping packet |
| 94 | +[1825525.874505] nf_conntrack: nf_conntrack: table full, dropping packet |
| 95 | +[1825525.881186] nf_conntrack: nf_conntrack: table full, dropping packet |
| 96 | +``` |
| 97 | + |
| 98 | +Check the current max number of conntrack entries allowed and tune accordenly |
| 99 | + |
| 100 | +``` |
| 101 | + cat /proc/sys/net/netfilter/nf_conntrack_max |
| 102 | +262144 |
| 103 | +``` |
| 104 | + |
| 105 | + |
| 106 | +4. Observe the metrics in prometheus or graphana |
| 107 | + |
| 108 | + |
| 109 | + |
| 110 | + |
| 111 | + |
| 112 | + |
| 113 | +## Future work |
| 114 | + |
| 115 | +We are interested in understanding the following variables |
| 116 | + |
| 117 | +* Memory and CPU consumption |
| 118 | +* Latency on packet processing |
| 119 | +* Latency to apply a network policy since it has been created |
| 120 | + |
| 121 | +This can microbencharked easily, using one Node or a Kind cluster and adding fake nodes and pods https://developer.ibm.com/tutorials/awb-using-kwok-to-simulate-a-large-kubernetes-openshift-cluster/ and running scenarios in just one node with the different variables |
| 122 | + |
| 123 | + |
| 124 | +Inputs: |
| 125 | + |
| 126 | +* New connections per seconds |
| 127 | +* Number of Pods on the cluster (affected or not affected by network policies) |
| 128 | +* Number of Network Policies impacting the connections |
0 commit comments