Input data/shape validation

**Is your feature request related to a problem? Please describe.**
Triton server doesn't do any validation of the data sent by clients. Specifically validation that the given shape matches the size of the input. For example if one client sends a vector with 15 elements and specifies the shape as [1,10], Triton blindly accepts this and passes it on to the backend, a similar issue arises if the client only sends a vector with 5 elements.

This could potentially lead to data leaking from one request into another one, an example that can trigger this behaviour is given here:

```python
FIXED_LAST_DIM = 5

alice = InferenceServerClient(TRITON_URL)
bob = InferenceServerClient(TRITON_URL)

input_data_1 = np.arange(FIXED_LAST_DIM + 2)[None].astype(np.float32)
print(f"{input_data_1=}")
inputs_1 = [
    InferInput("INPUT0", input_data_1.shape, np_to_triton_dtype(input_data_1.dtype)),
]
inputs_1[0].set_data_from_numpy(input_data_1)
# Usse the wrong dimension on purpose
inputs_1[0].set_shape((1, FIXED_LAST_DIM))

input_data_2 = 100 + np.arange(FIXED_LAST_DIM)[None].astype(np.float32)
print(f"{input_data_2=}")
inputs_2 = [InferInput("INPUT0", shape=input_data_2.shape, datatype=np_to_triton_dtype(input_data_2.dtype))]
inputs_2[0].set_data_from_numpy(input_data_2)

t1 = asyncio.create_task(alice.infer("dummy", inputs_1))
t2 = asyncio.create_task(bob.infer("dummy", inputs_2))

alice_result, bob_result = await asyncio.gather(t1, t2)
print(f"{alice_result.as_numpy('OUTPUT0')=}")
print(f"{bob_result.as_numpy('OUTPUT0')=}")
assert np.allclose(bob_result.as_numpy("OUTPUT0"), input_data_2)
```

With a example model config:
```pbtxt

name: "dummy"
backend: "pytorch"
max_batch_size: 8

input [
  {
    name: "INPUT0"
    data_type: TYPE_FP32
    dims: [ 5 ]
  }
]
output [
  {
    name: "OUTPUT0"
    data_type: TYPE_FP32
    dims: [ 5 ]
  }
]

dynamic_batching {{
    max_queue_delay_microseconds: 1000000
}}
```

This has been observed with the PyTorch backend, I am not sure if there are provisions in other backends in place.


**Describe the solution you'd like**
In cases where the sent request buffer size doesn't match we should fail fast, rejecting the request as early as possible. Probably before the said request is enqueued into a batcher. A trivial check would be if the total_num_elements (i.e.  the product of the whole shape vector) multiplied by the datatype size in bytes adds up to the actual size of the input buffer.

**Describe alternatives you've considered**
I have a draft for adding a validation like this to the libtorch backend: https://github.com/triton-inference-server/pytorch_backend/compare/main...speechmatics:pytorch_backend:check-shapes?expand=1 The problem with this is that it is very late in the Triton pipeline, we validate only once a request has been batched. I might have overseen something but at this point I am not sure if a single request can be rejected, at leaset I couldn't find an example for that.


**Additional context**
Ideally Triton core should also check each backends output, with the same check. This could be another feature request though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Input data/shape validation #7171

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Input data/shape validation #7171

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions