Skip to content

Commit e1680e3

Browse files
authored
Add something-something-v2 dataset support for video action recognition (#983)
* add sthsth v2 support * fix Tong comments and add mode zoo * add model store
1 parent 0a670aa commit e1680e3

File tree

11 files changed

+756
-29
lines changed

11 files changed

+756
-29
lines changed

docs/model_zoo/action_recognition.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,28 @@ The following table lists pre-trained models trained on Kinetics400.
8282
| i3d_resnet101_v1_kinetics400 [4]_ | ImageNet | 1 | 32 (64/2) | 74.8 | c5721407 | `shell script <https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/logs/action_recognition/kinetics400/i3d_resnet101_v1_kinetics400.sh>`_ | `log <https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/logs/action_recognition/kinetics400/i3d_resnet101_v1_kinetics400.log>`_ |
8383
+---------------------------------------------+------------------+--------------+----------------+-----------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+
8484

85+
Something-Something-V2 Dataset
86+
-------------------
87+
88+
The following table lists pre-trained models trained on Something-Something-V2.
89+
90+
.. note::
91+
92+
Our pre-trained models reproduce results from "Temporal Segment Networks (TSN)" [2]_ , "Inflated 3D Networks (I3D)" [3]_ . Please check the reference paper for further information.
93+
94+
95+
.. table::
96+
:widths: 40 8 8 8 10 8 8 10
97+
98+
+--------------------------------------+------------------+--------------+----------------+-----------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
99+
| Name | Pretrained | Segments | Clip Length | Top-1 | Hashtag | Train Command | Train Log |
100+
+======================================+==================+==============+================+===========+===========+===================================================================================================================================================================+=========================================================================================================================================================+
101+
| resnet50_v1b_sthsthv2 [2]_ | ImageNet | 8 | 1 | 35.5 | 80ee0c6b | `shell script <https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/logs/action_recognition/somethingsomethingv2/resnet50_v1b_sthsthv2_tsn.sh>`_ | `log <https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/logs/action_recognition/somethingsomethingv2/resnet50_v1b_sthsthv2_tsn.log>`_ |
102+
+--------------------------------------+------------------+--------------+----------------+-----------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
103+
| i3d_resnet50_v1_sthsthv2 [3]_ | ImageNet | 1 | 16 (32/2) | 50.6 | 01961e4c | `shell script <https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/logs/action_recognition/somethingsomethingv2/i3d_resnet50_v1_sthsthv2.sh>`_ | `log <https://raw.githubusercontent.com/dmlc/web-data/master/gluoncv/logs/action_recognition/somethingsomethingv2/i3d_resnet50_v1_sthsthv2.log>`_ |
104+
+--------------------------------------+------------------+--------------+----------------+-----------+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+
105+
106+
85107
.. [1] Limin Wang, Yuanjun Xiong, Zhe Wang and Yu Qiao. \
86108
"Towards Good Practices for Very Deep Two-Stream ConvNets." \
87109
arXiv preprint arXiv:1507.02159, 2015.

gluoncv/data/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
from .mixup.detection import MixupDetection
2222
from .ucf101.classification import UCF101, UCF101Attr
2323
from .kinetics400.classification import Kinetics400, Kinetics400Attr
24+
from .somethingsomethingv2.classification import SomethingSomethingV2, SomethingSomethingV2Attr
2425
from .sampler import SplitSampler
2526

2627
datasets = {
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# pylint: disable=wildcard-import
2+
"""Video action recognition, something-something-v2 dataset.
3+
https://20bn.com/datasets/something-something
4+
"""
5+
from __future__ import absolute_import
6+
from .classification import *

0 commit comments

Comments
 (0)