-
Notifications
You must be signed in to change notification settings - Fork 3.2k
TransT tracker integration #4886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…th nvidia/cuda:11.7.0-devel-ubuntu20.04
* Fix show empty tasks * v1.41.1 * Update changelog Co-authored-by: Boris Sekachev <[email protected]>
feat: upgrade dotenv-webpack from 7.1.1 to 8.0.0 Snyk has created this PR to upgrade dotenv-webpack from 7.1.1 to 8.0.0. See this package in npm: https://www.npmjs.com/package/dotenv-webpack See this project in Snyk: https://app.snyk.io/org/cvat/project/6c66365f-c154-46f2-b5db-4a4cd35fea4d?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]>
@@ -756,7 +756,7 @@ export class ToolsControlComponent extends React.PureComponent<Props, State> { | |||
}); | |||
// eslint-disable-next-line no-await-in-loop | |||
const response = await core.lambda.call(jobInstance.taskId, tracker, { | |||
frame: frame, | |||
frame: frame , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid the linter issue
Resolved #4768 |
Hi @dschoerk , thank for this really cool integration. This is very inspiring! How are the images fed/transformed? In meantime, how are those requests kept posting to nutlio server until all frames(images) are done? Thank you in advance and it would be super appreciated if there sample code for this video integration! Thanks! |
at this time CVAT only supports to track one step at a time. a bounding box (seed) is drawn on the initially frame, and each time you press the "f" key to step to the next frame the objects are tracked. afaik there is no functionality at the moment to track multiple frames at once. in the mentioned video i just keep pressing the "f" key. i hope this answers your question. |
Thank you, this is very helpful. Helped me understand CVAT much better. One step further, do you think there is can be functionality to automate the "next step" and "prediction" with each frame without pressing "f", until all frames are done? |
|
to extremely simplify things: AI tools are integrated as serverless functions i.e. they get called via a rest interface on the nuclio platform like some webservice. an image is sent from cvat to the service and it responds with the tracked location and state of the tracker. within this PR i have implemented such a service. from the perspective of implementation this is great, because of its simplicity - performance when tracking over multiple frames is not amazing with this approach. to implement what you're looking for is not trivial. the simplest i can imagine is to call the tracking service repeatedly until the required amount of frames is tracked. BUT this is not very performant. each frame is sent in a separate http request to the service and it requires n requests to tracking n frames. a better solution would be to have a service that's capable of tracking multiple frames, keep its state internally and doesn't require the images to be sent in the request but rather access them from a docker mount. all of that would reduce flexibility but increase performance. |
Thank you for the reply, my AI function also predict one frame at a time (batch size = 1), I can imagine the simplest is to repeatedly sent HTTP request, one after one automatically until all MP4 frames are sent. Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dschoerk,
Thanks for your contribution! I tested it, it works great. But I have some small comments. Could you fix them please?
def log(msg): | ||
#with open("/log.log", "a") as logf: | ||
# logf.write(msg+'\n') | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function can be removed.
@@ -0,0 +1,145 @@ | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a license to the beginning of the file.
# Copyright (C) 2022 CVAT.ai Corporation
#
# SPDX-License-Identifier: MIT
except Exception as e: # cavemen debugging | ||
logf = open("/error.log", "w") | ||
logf.write(str(e)) | ||
logf.write(traceback.format_exc()) | ||
|
||
return context.Response(headers={}, | ||
content_type='application/json', status_code=666) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be removed.
# logf.write(msg+'\n') | ||
pass | ||
|
||
def encode_state(model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please separate these functions into a separate ModelHandler
class (as is done for other serverless functions)?
image = Image.open(buf).convert('RGB') | ||
image = np.array(image)[:, :, ::-1].copy() | ||
|
||
#cv2.imwrite('/test.jpg', image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all useless comments.
I can't push to this thread. The requested URL returned error: 403. Opened another PR. |
How would you suggest I go by using TransT with multiple frames? Do I have to take the Docker route? I have about 1800 frames (30 fps, 1 minute long videos). Also, I don't find TransT on HuggingFace to use it on my own dataset, without having to use CVAT. Sure, CVAT makes my life easier for annotation, but it takes extremely long to annotate 1800 frames separately. What would be the most feasible solution to my problem? |
PR to integrate the single object tracker TransT as an AI tool into CVAT.
also see here: https://github.com/cvat-ai/cvat-opencv/issues/14