SysCV
diff --git a/‎INSTALL.md
Lines changed: 75 additions & 0 deletions b/‎INSTALL.md
Lines changed: 75 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 25 additions & 3 deletions b/‎README.md
Lines changed: 25 additions & 3 deletions
diff --git a/‎USAGE.md
Lines changed: 31 additions & 0 deletions b/‎USAGE.md
Lines changed: 31 additions & 0 deletions
diff --git a/‎__init__.py b/‎__init__.py
diff --git a/‎datasets/__init__.py
Lines changed: 37 additions & 0 deletions b/‎datasets/__init__.py
Lines changed: 37 additions & 0 deletions
diff --git a/‎datasets/coco.py
Lines changed: 176 additions & 0 deletions b/‎datasets/coco.py
Lines changed: 176 additions & 0 deletions
@@ -0,0 +1,75 @@
+### Installation
+
+First, clone the repository locally:
+
+```bash
+conda create -n vmt python=3.7 -y
+
+conda activate vmt
+
+conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 -c pytorch
+
+git clone --recursive https://github.com/SysCV/vmt.git
+```
+
+Install detectron2 for visualization under your working directory:
+```
+git clone https://github.com/facebookresearch/detectron2.git
+cd detectron2
+pip install -e .
+```
+
+Install dependencies and pycocotools for VIS and HQ-YTVIS:
+```bash
+pip install -r requirements.txt
+
+cd cocoapi_hq/PythonAPI
+# To compile and install locally 
+python setup.py build_ext --inplace
+# To install library to Python site-packages 
+python setup.py build_ext install
+```
+
+Compiling CUDA operators:
+
+```bash
+cd ./models/ops
+sh ./make.sh
+# unit test (should see all checking is True)
+python test.py
+
+cd ./models_swin/ops
+sh ./make.sh
+```
+
+### Data Preparation
+
+Download and extract 2019 version of YoutubeVIS train and val images with annotations from [YouTubeVIS](https://youtube-vos.org/dataset/vis/), and download [HQ-YTVIS annotations](https://www.vis.xyz/data/hqvis/) and COCO 2017 datasets. We expect the directory structure to be the following:
+
+
+```
+vmt
+├── datasets
+│   ├── coco_keepfor_ytvis19_new.json
+...
+ytvis
+├── train
+├── val
+├── annotations
+│   ├── instances_train_sub.json
+│   ├── instances_val_sub.json
+│   ├── ytvis_hq-train.json
+│   ├── ytvis_hq-val.json
+│   ├── ytvis_hq-test.json
+coco
+├── train2017
+├── val2017
+├── annotations
+│   ├── instances_train2017.json
+│   ├── instances_val2017.json
+```
+
+The modified coco annotations 'coco_keepfor_ytvis19_new.json' for joint training can be downloaded from [[google]](https://drive.google.com/file/d/18yKpc8wt7xJK26QFpR5Xa0vjM5HN6ieg/view?usp=sharing). The HQ-YTVIS annotations can be downloaded from [[google]](https://drive.google.com/drive/folders/1ZU8_qO8HnJ_-vvxIAn8-_kJ4xtOdkefh?usp=sharing).
+
+##  
+
@@ -29,11 +29,26 @@ python eval_hqvis.py --save-path prediction_results.json
 ```
 
 ## VMT Code
-<!-- <img src="figures/result_demo1.gif" width="830"/> -->
+---------------
+### Install
+Please refer to [INSTALL.md](INSTALL.md) for installation instructions.
 
 https://user-images.githubusercontent.com/17427852/181796768-3e79ee74-2465-4af8-ba89-b5c837098e00.mp4
 
-Code for VMT is coming soon (before ECCV happens).
+### Usages
+Please refer to [USAGE.md](USAGE.md) for dataset preparation and detailed running (including testing, visualization, etc.) instructions.
+
+### Model zoo
+
+#### HQ-YTVIS model
+
+Train on [HQ-YTVIS](https://www.vis.xyz/data/hqvis/) **train** set and COCO, evaluate on [HQ-YTVIS](https://www.vis.xyz/data/hqvis/) **test** set.       
+
+| Model                                                        | AP<sup>B</sup>   | AP<sup>B</sup><sub>75</sub> | AR<sup>B</sup><sub>1</sub> | AP<sup>M</sup>  | AR<sup>M</sup><sub>75</sub> | download                                                    |
+| ------------------------------------------------------------ | ---- | ---- | ---- | ---- | ---- | ------------------------------------------------------------ |
+| VMT_r50 | 30.7 | 24.2 | 31.5 | 50.5 | 54.5 | [weight](https://drive.google.com/file/d/1e9hKCC-pAGB-wSO0_qyUNoEe-5XzRocz/view?usp=sharing) |
+| VMT_r101 | 33.0 | 29.3 | 33.3 | 51.6 | 55.8 | [weight](https://drive.google.com/file/d/1TQs_meDaomLz56xCjAZKT1BNtS3K3sla/view?usp=sharing) |
+| VMT_swin_L | 44.8 | 43.4 | 43.0 | 64.8 | 70.1 | [weight](https://drive.google.com/file/d/13cDni9olYd6-xdURQMWstsW0VLbkgIKt/view?usp=sharing) |
 
 ## Citation
 
@@ -44,7 +59,14 @@ Code for VMT is coming soon (before ECCV happens).
     booktitle = {European Conference on Computer Vision (ECCV)},
     year = {2022}
 }
+
+@inproceedings{transfiner,
+    title={Mask Transfiner for High-Quality Instance Segmentation},
+    author={Ke, Lei and Danelljan, Martin and Li, Xia and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
+    booktitle = {CVPR},
+    year = {2022}
+} 
 ```
 
 ## Acknowledgement
-This repo is based on [Mask Transfiner](https://github.com/SysCV/transfiner) and [SeqFormer](https://github.com/wjf5203/SeqFormer).
+We thank [Mask Transfiner](https://github.com/SysCV/transfiner) and [SeqFormer](https://github.com/wjf5203/SeqFormer) for their open source codes.
@@ -0,0 +1,31 @@
+### Pretrained Models
+---------------
+Download the pretrained models from the Model zoo table: 
+```
+  mkdir pretrained_model
+  #And put the downloaded pretrained models in this directory.
+```
+
+### Inference & Evaluation on HQ-YTVIS
+---------------
+Refer to our [scripts folder](./scripts) for more commands:
+
+Evaluating on HQ-YTVIS test:
+```
+bash scripts/eval_swin_test.sh
+```
+or 
+```
+bash scripts/eval_r101_test.sh
+```
+
+### Results Visualization
+---------------
+```
+bash scripts/eval_swin_val_vis.sh
+```
+or
+```
+python3 -m tools.inference_swin_with_vis  --masks --backbone swin_l_p4w12 --output vis_output_swin_vmt --model_path ./pretrained_model/checkpoint_swinl_final.pth --save_path exp_swin_hq_val_result.json --save-frames True
+```
+
@@ -0,0 +1,37 @@
+import torch.utils.data
+from .torchvision_datasets import CocoDetection
+from datasets.ytvos import YTVOSDataset as YTVOSDataset
+
+from .coco import build as build_coco
+from .coco2seq import build as build_seq_coco
+from .concat_dataset import build as build_joint
+from .ytvos import build as build_ytvs
+
+
+
+def get_coco_api_from_dataset(dataset):
+    for _ in range(10):
+        if isinstance(dataset, torch.utils.data.Subset):
+            dataset = dataset.dataset
+    if isinstance(dataset, CocoDetection):
+        return dataset.coco
+    if isinstance(dataset, YTVOSDataset):
+        return dataset.ytvos
+
+
+### build_type only works for YoutubeVIS ###
+def build_dataset(image_set, args):
+    if args.dataset_file == 'YoutubeVIS':
+        return build_ytvs(image_set, args)
+
+    if args.dataset_file == 'coco':
+        return build_coco(image_set, args)
+    if args.dataset_file == 'Seq_coco':
+        return build_seq_coco(image_set, args)
+    if args.dataset_file == 'jointcoco':
+        return build_joint(image_set, args)
+
+        
+    raise ValueError(f'dataset {args.dataset_file} not supported')
+
+
@@ -0,0 +1,176 @@
+"""
+COCO dataset which returns image_id for evaluation.
+
+Mostly copy-paste from https://github.com/pytorch/vision/blob/13b35ff/references/detection/coco_utils.py
+"""
+from pathlib import Path
+
+import torch
+import torch.utils.data
+from pycocotools import mask as coco_mask
+
+from .torchvision_datasets import CocoDetection as TvCocoDetection
+from util.misc import get_local_rank, get_local_size
+import datasets.transforms as T
+import random
+
+
+class CocoDetection(TvCocoDetection):
+    def __init__(self, img_folder, ann_file, transforms, return_masks, cache_mode=False, local_rank=0, local_size=1):
+        super(CocoDetection, self).__init__(img_folder, ann_file,
+                                            cache_mode=cache_mode, local_rank=local_rank, local_size=local_size)
+        self._transforms = transforms
+        self.prepare = ConvertCocoPolysToMask(return_masks)
+
+    def __getitem__(self, idx):
+        
+        instance_check = False
+        while not instance_check:
+            img, target = super(CocoDetection, self).__getitem__(idx)
+            image_id = self.ids[idx]
+            target = {'image_id': image_id, 'annotations': target}
+            img, target = self.prepare(img, target)
+            if self._transforms is not None:
+                img, target = self._transforms(img, target)
+
+            if len(target['labels']) == 0: # None instance 
+                idx = random.randint(0,self.__len__()-1)
+            else:
+                instance_check=True
+
+        return img, target
+
+
+def convert_coco_poly_to_mask(segmentations, height, width):
+    masks = []
+    for polygons in segmentations:
+        rles = coco_mask.frPyObjects(polygons, height, width)
+        mask = coco_mask.decode(rles)
+        if len(mask.shape) < 3:
+            mask = mask[..., None]
+        mask = torch.as_tensor(mask, dtype=torch.uint8)
+        mask = mask.any(dim=2)
+        masks.append(mask)
+    if masks:
+        masks = torch.stack(masks, dim=0)
+    else:
+        masks = torch.zeros((0, height, width), dtype=torch.uint8)
+    return masks
+
+
+class ConvertCocoPolysToMask(object):
+    def __init__(self, return_masks=False):
+        self.return_masks = return_masks
+
+    def __call__(self, image, target):
+        w, h = image.size
+
+        image_id = target["image_id"]
+        image_id = torch.tensor([image_id])
+
+        anno = target["annotations"]
+
+        anno = [obj for obj in anno if 'iscrowd' not in obj or obj['iscrowd'] == 0]
+
+        boxes = [obj["bbox"] for obj in anno]
+        # guard against no boxes via resizing
+        boxes = torch.as_tensor(boxes, dtype=torch.float32).reshape(-1, 4)
+        boxes[:, 2:] += boxes[:, :2]
+        boxes[:, 0::2].clamp_(min=0, max=w)
+        boxes[:, 1::2].clamp_(min=0, max=h)
+
+        classes = [obj["category_id"] for obj in anno]
+        classes = torch.tensor(classes, dtype=torch.int64)
+
+        if self.return_masks:
+            segmentations = [obj["segmentation_refined"] for obj in anno]
+            masks = convert_coco_poly_to_mask(segmentations, h, w)
+
+        keypoints = None
+        if anno and "keypoints" in anno[0]:
+            keypoints = [obj["keypoints"] for obj in anno]
+            keypoints = torch.as_tensor(keypoints, dtype=torch.float32)
+            num_keypoints = keypoints.shape[0]
+            if num_keypoints:
+                keypoints = keypoints.view(num_keypoints, -1, 3)
+
+        keep = (boxes[:, 3] > boxes[:, 1]) & (boxes[:, 2] > boxes[:, 0])
+        boxes = boxes[keep]
+        classes = classes[keep]
+        if self.return_masks:
+            masks = masks[keep]
+        if keypoints is not None:
+            keypoints = keypoints[keep]
+
+        target = {}
+        target["boxes"] = boxes
+        target["labels"] = classes
+        if self.return_masks:
+            target["masks"] = masks
+        target["image_id"] = image_id
+        if keypoints is not None:
+            target["keypoints"] = keypoints
+
+        # for conversion to coco api
+        area = torch.tensor([obj["area"] for obj in anno])
+        iscrowd = torch.tensor([obj["iscrowd"] if "iscrowd" in obj else 0 for obj in anno])
+        target["area"] = area[keep]
+        target["iscrowd"] = iscrowd[keep]
+
+        target["orig_size"] = torch.as_tensor([int(h), int(w)])
+        target["size"] = torch.as_tensor([int(h), int(w)])
+
+        return image, target
+
+
+def make_coco_transforms(image_set):
+
+    normalize = T.Compose([
+        T.ToTensor(),
+        T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
+    ])
+
+    scales = [480, 512, 544, 576, 608, 640, 672, 704, 736, 768]
+    # scales = [296, 328, 360, 392]
+
+    if image_set == 'train':
+        return T.Compose([
+            T.RandomHorizontalFlip(),
+            T.RandomSelect(
+                T.RandomResize(scales, max_size=1333),
+                T.Compose([
+                    T.RandomResize([400, 500, 600]),
+                    T.RandomSizeCrop(384, 600),
+                    T.RandomResize(scales, max_size=1333),
+                ])
+            ),
+            normalize,
+        ])
+
+    if image_set == 'val':
+        return T.Compose([
+            T.RandomResize([800], max_size=1333),
+            normalize,
+        ])
+
+    raise ValueError(f'unknown {image_set}')
+
+
+def build(image_set, args):
+    root = Path(args.coco_path)
+    assert root.exists(), f'provided COCO path {root} does not exist'
+    mode = 'instances'
+    dataset_type = args.dataset_type
+    if args.dataset_file == 'coco':
+        PATHS = {
+            "train": (root / "train2017", root / "annotations" / f'{mode}_train2017.json'),
+            "val": (root / "val2017", root / "annotations" / f'{mode}_val2017.json'),
+        }
+
+
+    img_folder, ann_file = PATHS[image_set]
+    dataset = CocoDetection(img_folder, ann_file, transforms=make_coco_transforms(image_set), return_masks=args.masks,
+                            cache_mode=args.cache_mode, local_rank=get_local_rank(), local_size=get_local_size())
+    return dataset
+
+