Skip to content

Input data/shape validation #7171

Closed
triton-inference-server/core
#350
@HennerM

Description

@HennerM

Is your feature request related to a problem? Please describe.
Triton server doesn't do any validation of the data sent by clients. Specifically validation that the given shape matches the size of the input. For example if one client sends a vector with 15 elements and specifies the shape as [1,10], Triton blindly accepts this and passes it on to the backend, a similar issue arises if the client only sends a vector with 5 elements.

This could potentially lead to data leaking from one request into another one, an example that can trigger this behaviour is given here:

FIXED_LAST_DIM = 5

alice = InferenceServerClient(TRITON_URL)
bob = InferenceServerClient(TRITON_URL)

input_data_1 = np.arange(FIXED_LAST_DIM + 2)[None].astype(np.float32)
print(f"{input_data_1=}")
inputs_1 = [
    InferInput("INPUT0", input_data_1.shape, np_to_triton_dtype(input_data_1.dtype)),
]
inputs_1[0].set_data_from_numpy(input_data_1)
# Usse the wrong dimension on purpose
inputs_1[0].set_shape((1, FIXED_LAST_DIM))

input_data_2 = 100 + np.arange(FIXED_LAST_DIM)[None].astype(np.float32)
print(f"{input_data_2=}")
inputs_2 = [InferInput("INPUT0", shape=input_data_2.shape, datatype=np_to_triton_dtype(input_data_2.dtype))]
inputs_2[0].set_data_from_numpy(input_data_2)

t1 = asyncio.create_task(alice.infer("dummy", inputs_1))
t2 = asyncio.create_task(bob.infer("dummy", inputs_2))

alice_result, bob_result = await asyncio.gather(t1, t2)
print(f"{alice_result.as_numpy('OUTPUT0')=}")
print(f"{bob_result.as_numpy('OUTPUT0')=}")
assert np.allclose(bob_result.as_numpy("OUTPUT0"), input_data_2)

With a example model config:

name: "dummy"
backend: "pytorch"
max_batch_size: 8

input [
  {
    name: "INPUT0"
    data_type: TYPE_FP32
    dims: [ 5 ]
  }
]
output [
  {
    name: "OUTPUT0"
    data_type: TYPE_FP32
    dims: [ 5 ]
  }
]

dynamic_batching {{
    max_queue_delay_microseconds: 1000000
}}

This has been observed with the PyTorch backend, I am not sure if there are provisions in other backends in place.

Describe the solution you'd like
In cases where the sent request buffer size doesn't match we should fail fast, rejecting the request as early as possible. Probably before the said request is enqueued into a batcher. A trivial check would be if the total_num_elements (i.e. the product of the whole shape vector) multiplied by the datatype size in bytes adds up to the actual size of the input buffer.

Describe alternatives you've considered
I have a draft for adding a validation like this to the libtorch backend: https://github.com/triton-inference-server/pytorch_backend/compare/main...speechmatics:pytorch_backend:check-shapes?expand=1 The problem with this is that it is very late in the Triton pipeline, we validate only once a request has been batched. I might have overseen something but at this point I am not sure if a single request can be rejected, at leaset I couldn't find an example for that.

Additional context
Ideally Triton core should also check each backends output, with the same check. This could be another feature request though.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions