|
| 1 | +# Digital Human for Customer Service on GKE |
| 2 | + |
| 3 | +Deploying the digital human blueprint based on few NIMs on GKE. |
| 4 | + |
| 5 | +## Table of Contents |
| 6 | + |
| 7 | +- [Digital Human for Customer Service on GKE](#digital-human-for-customer-service-on-gke) |
| 8 | + - [Table of Contents](#table-of-contents) |
| 9 | + - [Prerequisites](#prerequisites) |
| 10 | + - [Setup](#setup) |
| 11 | + - [Test](#test) |
| 12 | + - [nv-embedqa-e5-v5](#nv-embedqa-e5-v5) |
| 13 | + - [nv-rerankqa-mistral-4b-v3](#nv-rerankqa-mistral-4b-v3) |
| 14 | + - [llama3-8b-instruct](#llama3-8b-instruct) |
| 15 | + - [parakeet-ctc-1.1b-asr](#parakeet-ctc-11b-asr) |
| 16 | + - [fastpitch-hifigan-tts](#fastpitch-hifigan-tts) |
| 17 | + - [audio2face-2d](#audio2face-2d) |
| 18 | + - [audio2face-3d](#audio2face-3d) |
| 19 | + - [Tear down](#tear-down) |
| 20 | + |
| 21 | +## Prerequisites |
| 22 | + |
| 23 | +- **GCloud SDK:** Ensure you have the Google Cloud SDK installed and configured. |
| 24 | +- **Project:** A Google Cloud project with billing enabled. |
| 25 | +- **NGC API Key:** An API key from NVIDIA NGC. Please read the prerequisites to access this key [here](https://github.com/NVIDIA-AI-Blueprints/digital-human/blob/main/README.md#prerequisites) |
| 26 | +- **kubectl:** kubectl command-line tool installed and configured. |
| 27 | +- **NVIDIA GPUs:** One of the below GPUs should work |
| 28 | + - [NVIDIA L4 GPU (8)](https://cloud.google.com/compute/docs/gpus#l4-gpus) |
| 29 | + - [NVIDIA A100 80GB (1) GPU](https://cloud.google.com/compute/docs/gpus#a100-gpus) |
| 30 | + - [NVIDIA H100 80GB (1) GPU or higher](https://cloud.google.com/compute/docs/gpus#a3-series) |
| 31 | + |
| 32 | +## Setup |
| 33 | + |
| 34 | +1. **Environment setup**: You'll set up several environment variables to make the following steps easier and more flexible. These variables store important information like cluster names, machine types, and API keys. You need to update the variable values to match your needs and context. |
| 35 | + |
| 36 | + ```bash |
| 37 | + gcloud config set project "<GCP Project ID>" |
| 38 | + |
| 39 | + export CLUSTER_NAME="gke-nimbp-dighuman" |
| 40 | + export NP_NAME="gke-nimbp-dighuman-gpunp" |
| 41 | + |
| 42 | + export ZONE="us-west4-a" # e.g., us-west4-a |
| 43 | + export NP_CPU_MACHTYPE="e2-standard-2" # e.g., e2-standard-2 |
| 44 | + export NP_GPU_MACHTYPE="g2-standard-96" # e.g., a2-ultragpu-1g |
| 45 | + |
| 46 | + export ACCELERATOR_TYPE="nvidia-l4" # e.g., nvidia-a100-80gb |
| 47 | + export ACCELERATOR_COUNT="8" # Or higher, as needed |
| 48 | + export NODE_POOL_NODES=1 # Or higher, as needed |
| 49 | + |
| 50 | + export NGC_API_KEY="<NGC API Key>" |
| 51 | + ``` |
| 52 | + |
| 53 | +2. **GKE Cluster and Node pool creation**: |
| 54 | + |
| 55 | + ```bash |
| 56 | + gcloud container clusters create "${CLUSTER_NAME}" \ |
| 57 | + --num-nodes="1" \ |
| 58 | + --location="${ZONE}" \ |
| 59 | + --machine-type="${NP_CPU_MACHTYPE}" \ |
| 60 | + --addons=GcpFilestoreCsiDriver |
| 61 | +
|
| 62 | + gcloud container node-pools create "${NP_NAME}" \ |
| 63 | + --cluster="${CLUSTER_NAME}" \ |
| 64 | + --location="${ZONE}" \ |
| 65 | + --node-locations="${ZONE}" \ |
| 66 | + --num-nodes="${NODE_POOL_NODES}" \ |
| 67 | + --machine-type="${NP_GPU_MACHTYPE}" \ |
| 68 | + --accelerator="type=${ACCELERATOR_TYPE},count=${ACCELERATOR_COUNT},gpu-driver-version=LATEST" \ |
| 69 | + --placement-type="COMPACT" \ |
| 70 | + --disk-type="pd-ssd" \ |
| 71 | + --disk-size="300GB" |
| 72 | + ``` |
| 73 | + |
| 74 | +3. **Get Cluster Credentials:** |
| 75 | + |
| 76 | + ```bash |
| 77 | + gcloud container clusters get-credentials "${CLUSTER_NAME}" --location="${ZONE}" |
| 78 | + ``` |
| 79 | + |
| 80 | +4. **Set kubectl Alias (Optional):** |
| 81 | + |
| 82 | + ```bash |
| 83 | + alias k=kubectl |
| 84 | + ``` |
| 85 | + |
| 86 | +5. **Create NGC API Key Secret:** Creates secrets for pulling images from NVIDIA NGC and pods that need the API key at startup. |
| 87 | + |
| 88 | + ```bash |
| 89 | + k create secret docker-registry secret-nvcr \ |
| 90 | + --docker-username=\$oauthtoken \ |
| 91 | + --docker-password="${NGC_API_KEY}" \ |
| 92 | + --docker-server="nvcr.io" |
| 93 | +
|
| 94 | + k create secret generic ngc-api-key \ |
| 95 | + --from-literal=NGC_API_KEY="${NGC_API_KEY}" |
| 96 | + ``` |
| 97 | + |
| 98 | +6. **Deploy NIMs:** |
| 99 | + |
| 100 | + ```bash |
| 101 | + k apply -f digital-human-nimbp.yaml |
| 102 | + ``` |
| 103 | + |
| 104 | + The NIM deployment takes upto 15mins for it to be complete. You can check the pods are in `Running` status: `k get pods` should list below pods. |
| 105 | + |
| 106 | + | NAME | READY | STATUS | RESTARTS | |
| 107 | + |---|---|---|---| |
| 108 | + |`dighum-embedqa-e5v5-aa-aa` | 1/1 | Running | 0 | |
| 109 | + |`dighum-rerankqa-mistral4bv3-bb-bb` | 1/1 | Running | 0 | |
| 110 | + |`dighum-llama3-8b-cc-cc` | 1/1 | Running | 0 | |
| 111 | + |`dighum-audio2face-3d-dd-dd` | 1/1 | Running | 0 | |
| 112 | + |`dighum-fastpitch-tts-ee-ee` | 1/1 | Running | 0 | |
| 113 | + |`dighum-maxine-audio2face-2d-ff-ff` | 1/1 | Running | 0 | |
| 114 | + |`dighum-parakeet-asr-1-1b-gg-gg` | 1/1 | Running | 0 | |
| 115 | + |
| 116 | +4. **Access NIM endpoints** |
| 117 | + |
| 118 | + ```bash |
| 119 | + SERVICES=$(k get svc | awk '{print $1}' | grep -v NAME | grep '^dighum') |
| 120 | +
|
| 121 | + for service in $SERVICES; do |
| 122 | + # Get the pod name. |
| 123 | + POD=$(k get pods -o go-template --template '{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | grep $(echo $service | sed 's/-lb//')) |
| 124 | +
|
| 125 | + # Get external IP. |
| 126 | + EXTERNAL_IP=$(k get svc $service -o jsonpath='{.status.loadBalancer.ingress[0].ip}') |
| 127 | +
|
| 128 | + echo "----------------------------------" |
| 129 | + echo "Testing service: $service at ${EXTERNAL_IP}" |
| 130 | + curl http://${EXTERNAL_IP}/v1/health/ready |
| 131 | + echo " " |
| 132 | + echo "----------------------------------" |
| 133 | + done |
| 134 | + ``` |
| 135 | + |
| 136 | + [Click here if you need HTTPS endpoints](https.md) |
| 137 | + |
| 138 | +## Test |
| 139 | + |
| 140 | +Below are curl statements to test each of the endpoints |
| 141 | + |
| 142 | +- ### nv-embedqa-e5-v5 |
| 143 | + |
| 144 | + Set `EXTERNAL_IP` from above output for `dighum-embedqa-e5v5` |
| 145 | + |
| 146 | + ```bash |
| 147 | + export EXTERNAL_IP=<IP> |
| 148 | + curl -X "POST" \ |
| 149 | + "http://${EXTERNAL_IP}/v1/embeddings" \ |
| 150 | + -H 'accept: application/json' \ |
| 151 | + -H 'Content-Type: application/json' \ |
| 152 | + -d '{ |
| 153 | + "input": ["Hello world"], |
| 154 | + "model": "nvidia/nv-embedqa-e5-v5", |
| 155 | + "input_type": "query" |
| 156 | + }' |
| 157 | + ``` |
| 158 | + |
| 159 | +- ### nv-rerankqa-mistral-4b-v3 |
| 160 | + |
| 161 | + Set `EXTERNAL_IP` from above output for `dighum-rerankqa-mistral4bv3` |
| 162 | + |
| 163 | + ```bash |
| 164 | + export EXTERNAL_IP=<IP> |
| 165 | +
|
| 166 | + curl -X "POST" \ |
| 167 | + "http://${EXTERNAL_IP}/v1/ranking" \ |
| 168 | + -H 'accept: application/json' \ |
| 169 | + -H 'Content-Type: application/json' \ |
| 170 | + -d '{ |
| 171 | + "model": "nvidia/nv-rerankqa-mistral-4b-v3", |
| 172 | + "query": {"text": "which way should i go?"}, |
| 173 | + "passages": [ |
| 174 | + {"text": "two roads diverged in a yellow wood, and sorry i could not travel both and be one traveler, long i stood and looked down one as far as i could to where it bent in the undergrowth;"} |
| 175 | + ], |
| 176 | + "truncate": "END" |
| 177 | + }' |
| 178 | + ``` |
| 179 | + |
| 180 | +- ### llama3-8b-instruct |
| 181 | + |
| 182 | + Set `EXTERNAL_IP` from above output for `dighum-llama3-8b` |
| 183 | + |
| 184 | + ```bash |
| 185 | + export EXTERNAL_IP=<IP> |
| 186 | +
|
| 187 | + curl -X "POST" \ |
| 188 | + "http://${EXTERNAL_IP}/v1/chat/completions" \ |
| 189 | + -H 'accept: application/json' \ |
| 190 | + -H 'Content-Type: application/json' \ |
| 191 | + -d '{ |
| 192 | + "model": "meta/llama3-8b-instruct", |
| 193 | + "messages": [{"role":"user", "content":"Write a limerick about the wonders of GPU computing."}], |
| 194 | + "max_tokens": 64 |
| 195 | + }' |
| 196 | + ``` |
| 197 | + |
| 198 | +- ### parakeet-ctc-1.1b-asr |
| 199 | + |
| 200 | + - Install the Riva Python client package |
| 201 | + |
| 202 | + ```bash |
| 203 | + python3 -m venv venv |
| 204 | + source venv/bin/activate |
| 205 | + pip install nvidia-riva-client |
| 206 | + ``` |
| 207 | + |
| 208 | + - Download Riva sample clients |
| 209 | + |
| 210 | + ```bash |
| 211 | +
|
| 212 | + git clone https://github.com/nvidia-riva/python-clients.git |
| 213 | +
|
| 214 | + ``` |
| 215 | + |
| 216 | + - Run Speech to Text inference in streaming modes. Riva ASR supports Mono, 16-bit audio in WAV, OPUS and FLAC formats. |
| 217 | + |
| 218 | + ```bash |
| 219 | + |
| 220 | + k port-forward $(k get pod --selector="app=dighum-parakeet-asr-1-1b" --output jsonpath='{.items[0].metadata.name}') 50051:50051 |
| 221 | +
|
| 222 | + python3 python-clients/scripts/asr/transcribe_file.py --server 0.0.0.0:50051 --input-file ./output.wav --language-code en-US |
| 223 | +
|
| 224 | + deactivate |
| 225 | +
|
| 226 | + ``` |
| 227 | + |
| 228 | + For more details on getting started with this NIM, visit the [Riva ASR NIM Docs](https://docs.nvidia.com/nim/riva/asr/latest/overview.html) |
| 229 | + |
| 230 | +- ### fastpitch-hifigan-tts |
| 231 | + |
| 232 | + - Install the Riva Python client package |
| 233 | + |
| 234 | + ```bash |
| 235 | + python3 -m venv venv |
| 236 | + source venv/bin/activate |
| 237 | + pip install nvidia-riva-client |
| 238 | + ``` |
| 239 | + |
| 240 | + - Download Riva sample clients |
| 241 | + |
| 242 | + ```bash |
| 243 | + |
| 244 | + git clone https://github.com/nvidia-riva/python-clients.git |
| 245 | +
|
| 246 | + ``` |
| 247 | + |
| 248 | + - Use `kubectl` to port forward |
| 249 | + |
| 250 | + ```bash |
| 251 | +
|
| 252 | + k port-forward $(k get pod --selector="app=dighum-parakeet-asr-1-1b" --output jsonpath='{.items[0].metadata.name}') 50051:50051 & |
| 253 | +
|
| 254 | + ``` |
| 255 | + |
| 256 | + - Run Speech to Text inference in streaming modes. Riva ASR supports Mono, 16-bit audio in WAV, OPUS and FLAC formats. |
| 257 | + |
| 258 | + ```bash |
| 259 | + python3 python-clients/scripts/tts/talk.py --server 0.0.0.0:50051 --text "Hello, this is a speech synthesizer." --language-code en-US --output output.wav |
| 260 | +
|
| 261 | + deactivate |
| 262 | + ``` |
| 263 | + |
| 264 | + On running the above command, the synthesized audio file named output.wav will be created. |
| 265 | + |
| 266 | +- ### audio2face-2d |
| 267 | + |
| 268 | + - Setup a virtual env |
| 269 | + |
| 270 | + ```bash |
| 271 | + python3 -m venv venv |
| 272 | + source venv/bin/activate |
| 273 | + ``` |
| 274 | + |
| 275 | + - Download the Audio2Face-2D client code |
| 276 | + |
| 277 | + ```bash |
| 278 | + git clone https://github.com/NVIDIA-Maxine/nim-clients.git |
| 279 | + cd nim-clients/audio2face-2d/ |
| 280 | + pip install -r python/requirements.txt |
| 281 | + ``` |
| 282 | + |
| 283 | + - Compile the protos |
| 284 | + |
| 285 | + ```bash |
| 286 | + cd protos/linux/python |
| 287 | + chmod +x compile_protos.sh |
| 288 | + ./compile_protos.sh |
| 289 | + ``` |
| 290 | + |
| 291 | + - Run test inference |
| 292 | + |
| 293 | + ```bash |
| 294 | + cd python/scripts |
| 295 | + |
| 296 | + python audio2face-2d.py --target <server_ip:port> \ |
| 297 | + --audio-input <input audio file path> \ |
| 298 | + --portrait-input <input portrait image file path> \ |
| 299 | + --output <output file path and the file name> \ |
| 300 | + --head-rotation-animation-filepath <rotation animation filepath> \ |
| 301 | + --head-translation-animation-filepath <translation animation filepath> \ |
| 302 | + --ssl-mode <ssl mode value> \ |
| 303 | + --ssl-key <ssl key file path> \ |
| 304 | + --ssl-cert <ssl cert filepath> \ |
| 305 | + --ssl-root-cert <ssl root cert filepath> |
| 306 | + ``` |
| 307 | + |
| 308 | + Refer the documentation [audio2face-2d](https://docs.nvidia.com/nim/maxine/audio2face-2d/latest/basic-inference.html#running-inference-via-node-js-script) NIM to set the values. |
| 309 | + |
| 310 | +- ### audio2face-3d |
| 311 | + |
| 312 | + - Setup a virtual env |
| 313 | + |
| 314 | + ```bash |
| 315 | + python3 -m venv venv |
| 316 | + source venv/bin/activate |
| 317 | + ``` |
| 318 | + |
| 319 | + - Download the Audio2Face-2D client code |
| 320 | + |
| 321 | + ```bash |
| 322 | + git clone https://github.com/NVIDIA/Audio2Face-3D-Samples.git |
| 323 | + cd Audio2Face-3D-Samples/scripts/audio2face_3d_microservices_interaction_app |
| 324 | +
|
| 325 | + pip3 install ../../proto/sample_wheel/nvidia_ace-1.2.0-py3-none-any.whl |
| 326 | +
|
| 327 | + pip3 install -r requirements.txt |
| 328 | + ``` |
| 329 | + |
| 330 | + - Perform a health check |
| 331 | + |
| 332 | + ```bash |
| 333 | + python3 a2f_3d.py health_check --url 0.0.0.0:52000 |
| 334 | + ``` |
| 335 | + |
| 336 | + - Run a test inference |
| 337 | + |
| 338 | + ```bash |
| 339 | + python3 a2f_3d.py run_inference ../../example_audio/Claire_neutral.wav config/config_claire.yml \ |
| 340 | + -u 0.0.0.0:52000 |
| 341 | + ``` |
| 342 | + |
| 343 | + Refer the documentation of [audio2face-3d](https://docs.nvidia.com/ace/audio2face-3d-microservice/latest/text/getting-started/getting-started.html#running-inference) NIM for more information. |
| 344 | + |
| 345 | +## Tear down |
| 346 | + |
| 347 | + **Tear down the environment** |
| 348 | + **NOTE:** Please note all the NIMs deployed and cluster will be deleted. |
| 349 | + |
| 350 | + ```bash |
| 351 | + k delete -f digital-human-nimbp.yaml |
| 352 | + k delete secret secret-nvcr |
| 353 | + k delete secret ngc-api-key |
| 354 | + gcloud container clusters delete "${CLUSTER_NAME}" \ |
| 355 | + --location="${ZONE}" --quiet |
| 356 | + ``` |
0 commit comments