typing: fix typing on encode #3270
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello!
The encoding function had inconsistent typing. Specifically, it did not include types for
output_value=None
, where features are returned, and ignored the fact that the values ofoutput_tensor
andoutput_numpy
are ignored ifoutput_value != "sentence_embedding"
.This PR fixes all those issues, and should hopefully lead to a more consistent experience. For what it's worth, I kept the possibility of passing in an
ndarray
, but I think this should be fixed, because the actual input type of the function isIterator[str]
. There's actually no reason for it to be a list.Here's the test cases I used. Unfortunately there's no real nice way to introspect types AFAIK, but this shoud be sufficient to check using
reveal_type
or your IDE.The format here is: the first line below an invocation indicates the base type. If this is a container (i.e., a list), the second line outputs the type of the first item of the container. On my local machine I matched these with actual tests, and they all produced the same output:
See here for the script: