-
Notifications
You must be signed in to change notification settings - Fork 133
Open
Description
I am looking for the part where you mentioned in your paper, segment the video and use the pyscenedetect algorithm to make it CLIP.
It doesn't seem to be uploaded to the repository yet. When can I see the code?
Or if you can point me to a reference, I would appreciate it.
Segment and CLIP-score filtering
As the point tracking system works in a short time window, we begin by using the annotations provided, curated or otherwise, to split each video into segments, and then run PySceneDetect [10] to further break each segment into short video clips with consistent shots. However, the detected video clips may not always be relevant to their associated text annotations. Thus, we use the pretrained CLIP [101] visual and text encoders to compute the cosine similarity score between each clip and text pair, and filter out clips with < 0.25 scores.
Metadata
Metadata
Assignees
Labels
No labels