Skip to content

Commit fa4aca4

Browse files
rmccorm4mc-nv
authored andcommitted
Add Redis cache build, tests, and docs (#5916)
1 parent de3b436 commit fa4aca4

File tree

4 files changed

+183
-56
lines changed

4 files changed

+183
-56
lines changed

build.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1769,8 +1769,7 @@ def enable_all():
17691769
'tensorrt'
17701770
]
17711771
all_repoagents = ['checksum']
1772-
# DLIS-4491: Add redis cache to build
1773-
all_caches = ['local']
1772+
all_caches = ['local', 'redis']
17741773
all_filesystems = ['gcs', 's3', 'azure_storage']
17751774
all_endpoints = ['http', 'grpc', 'sagemaker', 'vertex-ai']
17761775

@@ -1788,8 +1787,7 @@ def enable_all():
17881787
'openvino', 'tensorrt'
17891788
]
17901789
all_repoagents = ['checksum']
1791-
# DLIS-4491: Add redis cache to build
1792-
all_caches = ['local']
1790+
all_caches = ['local', 'redis']
17931791
all_filesystems = []
17941792
all_endpoints = ['http', 'grpc']
17951793

docs/user_guide/response_cache.md

Lines changed: 36 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -101,10 +101,12 @@ that are used to communicate with a cache implementation of the user's choice.
101101

102102
A cache implementation is a shared library that implements the required
103103
TRITONCACHE APIs and is dynamically loaded on server startup, if enabled.
104-
For tags `>=23.03`,
104+
105+
Triton's most recent
105106
[tritonserver release containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver)
106107
come with the following cache implementations out of the box:
107108
- [local](https://github.com/triton-inference-server/local_cache): `/opt/tritonserver/caches/local/libtritoncache_local.so`
109+
- [redis](https://github.com/triton-inference-server/redis_cache): `/opt/tritonserver/caches/redis/libtritoncache_redis.so`
108110

109111
With these TRITONCACHE APIs, `tritonserver` exposes a new `--cache-config`
110112
CLI flag that gives the user flexible customization of which cache implementation
@@ -124,18 +126,44 @@ When `--cache-config local,size=SIZE` is specified with a non-zero `SIZE`,
124126
Triton allocates the requested size in CPU memory and **shares the
125127
cache across all inference requests and across all models**.
126128

129+
#### Redis Cache
130+
131+
The `redis` cache implementation exposes the ability for Triton to communicate
132+
with a Redis server for caching. The `redis_cache` implementation is essentially
133+
a Redis client that acts as an intermediary between Triton and Redis.
134+
135+
To list a few benefits of the `redis` cache compared to the `local` cache in
136+
the context of Triton:
137+
- The Redis server can be hosted remotely as long as it is accesible by Triton,
138+
so it is not tied directly to the Triton process lifetime.
139+
- This means Triton can be restarted and still have access to previously cached entries.
140+
- This also means that Triton doesn't have to compete with the cache for memory/resource usage.
141+
- Multiple Triton instances can share a cache by configuring each Triton instance
142+
to communicate with the same Redis server.
143+
- The Redis server can be updated/restarted independently of Triton, and
144+
Triton will fallback to operating as it would with no cache access during
145+
any Redis server downtime, and log appropriate errors.
146+
147+
In general, the Redis server can be configured/deployed as needed for your use
148+
case, and Triton's `redis` cache will simply act as a client of your Redis
149+
deployment. The [Redis docs](https://redis.io/docs/) should be consulted for
150+
questions and details about configuring the Redis server.
151+
152+
For Triton-specific `redis` cache implementation details/configuration, see the
153+
[redis cache implementation](https://github.com/triton-inference-server/redis_cache).
154+
127155
#### Custom Cache
128156

129-
With the new the TRITONCACHE API interface, it is now possible for
157+
With the TRITONCACHE API interface, it is now possible for
130158
users to implement their own cache to suit any use-case specific needs.
131159
To see the required interface that must be implemented by a cache
132160
developer, see the
133161
[TRITONCACHE API header](https://github.com/triton-inference-server/core/blob/main/include/triton/core/tritoncache.h).
134-
The `local` cache implementation may be used as a reference implementation.
162+
The `local` or `redis` cache implementations may be used as reference.
135163

136164
Upon successfully developing and building a custom cache, the resulting shared
137165
library (ex: `libtritoncache_<name>.so`) must be placed in the cache directory
138-
similar to where the `local` cache implementation lives. By default,
166+
similar to where the `local` and `redis` cache implementations live. By default,
139167
this directory is `/opt/tritonserver/caches`, but a custom directory may be
140168
specified with `--cache-dir` as needed.
141169

@@ -184,9 +212,10 @@ a response.
184212
For cases where cache hits are common and computation is expensive,
185213
the cache can significantly improve overall performance.
186214

187-
For cases where all or most requests are unique (cache misses), the
188-
cache may negatively impact the overall performance due to the overhead
189-
of managing the cache.
215+
For cases where most requests are unique (cache misses) or the compute is
216+
fast/cheap (the model is not compute-bound), the cache can negatively impact
217+
the overall performance due to the overhead of managing and communicating with
218+
the cache.
190219

191220
## Known Limitations
192221

qa/L0_response_cache/test.sh

Lines changed: 140 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,69 @@ RET=0
2929

3030
TEST_LOG="./response_cache_test.log"
3131
UNIT_TEST=./response_cache_test
32+
export CUDA_VISIBLE_DEVICES=0
33+
34+
# Only localhost supported in this test for now, but in future could make
35+
# use of a persistent remote redis server, or similarly use --replicaof arg.
36+
export TRITON_REDIS_HOST="localhost"
37+
export TRITON_REDIS_PORT="6379"
3238

3339
rm -fr *.log
3440

35-
# UNIT TEST
41+
function install_redis() {
42+
## Install redis if not already installed
43+
if ! command -v redis-server >/dev/null 2>&1; then
44+
apt update -y && apt install -y redis
45+
fi
46+
}
47+
48+
function start_redis() {
49+
# Run redis server in background
50+
redis-server --daemonize yes --port "${TRITON_REDIS_PORT}"
51+
52+
# Check redis server is running
53+
REDIS_PING_RESPONSE=$(redis-cli -h ${TRITON_REDIS_HOST} -p ${TRITON_REDIS_PORT} ping)
54+
if [ "${REDIS_PING_RESPONSE}" == "PONG" ]; then
55+
echo "Redis successfully started in background"
56+
else
57+
echo -e "\n***\n*** Failed: Redis server did not start successfully\n***"
58+
RET=1
59+
fi
60+
}
61+
62+
function stop_redis() {
63+
echo "Stopping Redis server..."
64+
redis-cli -h "${TRITON_REDIS_HOST}" -p "${TRITON_REDIS_PORT}" shutdown || true
65+
echo "Redis server shutdown"
66+
}
67+
68+
function set_redis_auth() {
69+
# NOTE: Per-user auth [Access Control List (ACL)] is only supported in
70+
# Redis >= 6.0 and is more comprehensive in what can be configured.
71+
# For simplicity and wider range of Redis version support, use
72+
# server-wide password via "requirepass" for now.
73+
redis-cli -h "${TRITON_REDIS_HOST}" -p "${TRITON_REDIS_PORT}" config set requirepass "${REDIS_PW}"
74+
export REDISCLI_AUTH="${REDIS_PW}"
75+
}
76+
77+
function unset_redis_auth() {
78+
# Authenticate implicitly via REDISCLI_AUTH env var, then unset password/var
79+
redis-cli -h "${TRITON_REDIS_HOST}" -p "${TRITON_REDIS_PORT}" config set requirepass ""
80+
unset REDISCLI_AUTH
81+
}
82+
83+
# UNIT TESTS
3684
set +e
37-
export CUDA_VISIBLE_DEVICES=0
85+
86+
## Unit tests currently run for both Local and Redis cache implementaitons
87+
## by default. However, we could break out the unit tests for each
88+
## into separate runs gtest filters if needed in the future:
89+
## - `${UNIT_TEST} --gtest_filter=*Local*`
90+
## - `${UNIT_TEST} --gtest_filter=*Redis*`
91+
install_redis
92+
# Stop any existing redis server first for good measure
93+
stop_redis
94+
start_redis
3895
LD_LIBRARY_PATH=/opt/tritonserver/lib:$LD_LIBRARY_PATH $UNIT_TEST >>$TEST_LOG 2>&1
3996
if [ $? -ne 0 ]; then
4097
cat $TEST_LOG
@@ -48,10 +105,33 @@ function check_server_success_and_kill {
48105
if [ "${SERVER_PID}" == "0" ]; then
49106
echo -e "\n***\n*** Failed to start ${SERVER}\n***"
50107
cat ${SERVER_LOG}
51-
exit 1
108+
RET=1
109+
else
110+
kill ${SERVER_PID}
111+
wait ${SERVER_PID}
112+
fi
113+
}
114+
115+
function check_server_expected_failure {
116+
EXPECTED_MESSAGE="${1}"
117+
if [ "${SERVER_PID}" != "0" ]; then
118+
echo -e "\n***\n*** Failed: ${SERVER} started successfully when it was expected to fail\n***"
119+
cat ${SERVER_LOG}
120+
RET=1
121+
122+
kill ${SERVER_PID}
123+
wait ${SERVER_PID}
124+
else
125+
# Check that server fails with the correct error message
126+
set +e
127+
grep -i "${EXPECTED_MESSAGE}" ${SERVER_LOG}
128+
if [ $? -ne 0 ]; then
129+
echo -e "\n***\n*** Failed: Expected [${EXPECTED_MESSAGE}] error message in output\n***"
130+
cat $SERVER_LOG
131+
RET=1
132+
fi
133+
set -e
52134
fi
53-
kill $SERVER_PID
54-
wait $SERVER_PID
55135
}
56136

57137
MODEL_DIR="${PWD}/models"
@@ -102,46 +182,66 @@ check_server_success_and_kill
102182
# Test that specifying multiple cache types is not supported and should fail
103183
SERVER_ARGS="--model-repository=${MODEL_DIR} --cache-config=local,size=8192 --cache-config=redis,key=value ${EXTRA_ARGS}"
104184
run_server
105-
if [ "$SERVER_PID" != "0" ]; then
106-
echo -e "\n***\n*** Failed: $SERVER started successfully when it was expected to fail\n***"
107-
cat $SERVER_LOG
108-
RET=1
109-
110-
kill $SERVER_PID
111-
wait $SERVER_PID
112-
else
113-
# Check that server fails with the correct error message
114-
set +e
115-
grep -i "multiple cache configurations" ${SERVER_LOG}
116-
if [ $? -ne 0 ]; then
117-
echo -e "\n***\n*** Failed: Expected multiple cache configuration error message in output\n***"
118-
cat $SERVER_LOG
119-
RET=1
120-
fi
121-
set -e
122-
fi
185+
check_server_expected_failure "multiple cache configurations"
123186

124187
# Test that specifying both config styles is incompatible and should fail
125188
SERVER_ARGS="--model-repository=${MODEL_DIR} --response-cache-byte-size=12345 --cache-config=local,size=67890 ${EXTRA_ARGS}"
126189
run_server
127-
if [ "$SERVER_PID" != "0" ]; then
128-
echo -e "\n***\n*** Failed: $SERVER started successfully when it was expected to fail\n***"
129-
cat $SERVER_LOG
130-
RET=1
190+
check_server_expected_failure "incompatible flags"
131191

132-
kill $SERVER_PID
133-
wait $SERVER_PID
134-
else
135-
# Check that server fails with the correct error message
136-
set +e
137-
grep -i "incompatible flags" ${SERVER_LOG}
138-
if [ $? -ne 0 ]; then
139-
echo -e "\n***\n*** Failed: Expected incompatible cache config flags error message in output\n***"
140-
cat $SERVER_LOG
141-
RET=1
142-
fi
143-
set -e
144-
fi
192+
## Redis Cache CLI tests
193+
REDIS_ENDPOINT="--cache-config redis,host=${TRITON_REDIS_HOST} --cache-config redis,port=${TRITON_REDIS_PORT}"
194+
195+
# Test simple redis cache config succeeds
196+
SERVER_ARGS="--model-repository=${MODEL_DIR} ${REDIS_ENDPOINT} ${EXTRA_ARGS}"
197+
run_server
198+
check_server_success_and_kill
199+
200+
# Test triton fails to initialize if it can't connect to redis cache
201+
SERVER_ARGS="--model-repository=${MODEL_DIR} --cache-config=redis,host=localhost --cache-config=redis,port=nonexistent ${EXTRA_ARGS}"
202+
run_server
203+
check_server_expected_failure "Failed to connect to Redis: Connection refused"
204+
205+
# Test triton fails to initialize if it can't resolve host for redis cache
206+
SERVER_ARGS="--model-repository=${MODEL_DIR} --cache-config=redis,host=nonexistent --cache-config=redis,port=nonexistent ${EXTRA_ARGS}"
207+
run_server
208+
# Either of these errors can be returned for bad hostname, so check for either.
209+
MSG1="Temporary failure in name resolution"
210+
MSG2="Name or service not known"
211+
check_server_expected_failure "${MSG1}\|${MSG2}"
212+
213+
# Test triton fails to initialize if minimum required args (host & port) not all provided
214+
SERVER_ARGS="--model-repository=${MODEL_DIR} --cache-config=redis,port=${TRITON_REDIS_HOST} ${EXTRA_ARGS}"
215+
run_server
216+
check_server_expected_failure "Must at a minimum specify"
217+
218+
## Redis Authentication tests
219+
220+
# Automatically provide auth via REDISCLI_AUTH env var when set: https://redis.io/docs/ui/cli/
221+
REDIS_PW="redis123!"
222+
set_redis_auth
223+
224+
# Test simple redis authentication succeeds with correct credentials
225+
REDIS_CACHE_AUTH="--cache-config redis,password=${REDIS_PW}"
226+
SERVER_ARGS="--model-repository=${MODEL_DIR} ${REDIS_ENDPOINT} ${REDIS_CACHE_AUTH} ${EXTRA_ARGS}"
227+
run_server
228+
check_server_success_and_kill
229+
230+
# Test simple redis authentication fails with wrong credentials
231+
REDIS_CACHE_AUTH="--cache-config redis,password=wrong"
232+
SERVER_ARGS="--model-repository=${MODEL_DIR} ${REDIS_ENDPOINT} ${REDIS_CACHE_AUTH} ${EXTRA_ARGS}"
233+
run_server
234+
check_server_expected_failure "WRONGPASS"
235+
236+
237+
# Test simple redis authentication fails with no credentials
238+
SERVER_ARGS="--model-repository=${MODEL_DIR} ${REDIS_ENDPOINT} ${EXTRA_ARGS}"
239+
run_server
240+
check_server_expected_failure "NOAUTH Authentication required"
241+
242+
# Clean up redis server before exiting test
243+
unset_redis_auth
244+
stop_redis
145245

146246
if [ $RET -eq 0 ]; then
147247
echo -e "\n***\n*** Test Passed\n***"

qa/common/util.sh

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ function wait_for_server_ready() {
6666

6767
local wait_secs=$wait_time_secs
6868
until test $wait_secs -eq 0 ; do
69-
if ! kill -0 $spid; then
69+
if ! kill -0 $spid > /dev/null 2>&1; then
7070
echo "=== Server not running."
7171
WAIT_RET=1
7272
return
@@ -147,13 +147,13 @@ function wait_for_model_stable() {
147147
}
148148

149149
function gdb_helper () {
150-
if ! command -v gdb; then
150+
if ! command -v gdb > /dev/null 2>&1; then
151151
echo "=== WARNING: gdb not installed"
152152
return
153153
fi
154154

155155
### Server Hang ###
156-
if kill -0 ${SERVER_PID}; then
156+
if kill -0 ${SERVER_PID} > /dev/null 2>&1; then
157157
# If server process is still alive, try to get backtrace and core dump from it
158158
GDB_LOG="gdb_bt.${SERVER_PID}.log"
159159
echo -e "=== WARNING: SERVER HANG DETECTED, DUMPING GDB BACKTRACE TO [${PWD}/${GDB_LOG}] ==="
@@ -166,7 +166,7 @@ function gdb_helper () {
166166

167167
### Server Segfaulted ###
168168
# If there are any core dumps locally from a segfault, load them and get a backtrace
169-
for corefile in $(ls core.*); do
169+
for corefile in $(ls core.* > /dev/null 2>&1); do
170170
GDB_LOG="${corefile}.log"
171171
echo -e "=== WARNING: SEGFAULT DETECTED, DUMPING GDB BACKTRACE TO [${PWD}/${GDB_LOG}] ==="
172172
gdb -batch ${SERVER} ${corefile} -ex "thread apply all bt" | tee "${corefile}.log" || true;
@@ -204,7 +204,7 @@ function run_server () {
204204
gdb_helper || true
205205

206206
# Cleanup
207-
kill $SERVER_PID || true
207+
kill $SERVER_PID > /dev/null 2>&1 || true
208208
SERVER_PID=0
209209
fi
210210
}

0 commit comments

Comments
 (0)