mmengine the training process didn't run correctly #576

Yoooss · 2022-11-09T03:11:28Z

Yoooss
Nov 9, 2022

I tried to use the command
"
bash ./tools/benchmarks/mmdetection/mim_dist_train_c4.sh configs/benchmarks/mmdetection/voc0712/faster_rcnn_r50_c4_mstrain_24k_voc0712ls.py work_dirs/selfsup/densecl_resnet50_8xb32-coslr-200e_in1k/epoch_200.pth 1
"

And the config I used is:
"
base = 'mmdet::pascal_voc/faster-rcnn_r50-caffe-c4_ms-18k_voc0712.py'

data_preprocessor = dict(
type='DetDataPreprocessor',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
bgr_to_rgb=True,
pad_size_divisor=32)

norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
backbone=dict(
frozen_stages=-1,
norm_cfg=norm_cfg,
norm_eval=False,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
roi_head=dict(
shared_head=dict(
type='ResLayerExtraNorm',
norm_cfg=norm_cfg,
norm_eval=False,
style='pytorch'),
bbox_head=dict(num_classes=2)))

train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='RandomChoiceResize',
scales = [(666, 240), (666, 256), (666,272), (666, 288),
(666, 304), (666, 320), (666, 336), (666, 352),
(666, 368), (666, 384), (666, 400)],
keep_ratio=True),
dict(type='RandomFlip', prob=0.5),
dict(type='PackDetInputs')
]

test_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='Resize', scale=(666, 400), keep_ratio=True),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor'))
]
dataset_type = 'VOCDataset'
data_root = '/media/ls/disk1/DOTA/VOCdevkit/'

train_dataloader = dict(
batch_size=2,
num_workers=1,
sampler=dict(type='InfiniteSampler', shuffle=True),
dataset=dict(
delete=True,
type='VOCDataset',
data_root=data_root,
ann_file='VOC2007/ImageSets/Main/trainval.txt',
data_prefix=dict(sub_data_root='VOC2007/'),
filter_cfg=dict(filter_empty_gt=True, min_size=32),
pipeline=train_pipeline,
))

val_dataloader = dict(dataset=dict(pipeline=test_pipeline,data_root=data_root,))
test_dataloader = val_dataloader

train_cfg = dict(delete=True, type='EpochBasedTrainLoop', max_epochs=24, val_interval=4)
#max_iter = 824

param_scheduler = [
dict(
type='LinearLR', start_factor=0.001, by_epoch=False, begin=0,
end=1000),
dict(
type='MultiStepLR',
begin=0,
end=24,
by_epoch=True,
milestones=[16, 22],
gamma=0.1)
]

val_evaluator = dict(type='VOCMetric', metric='mAP', eval_mode='11points')
test_evaluator = val_evaluator

default_hooks = dict(checkpoint=dict(by_epoch=True, interval=4))

log_processor = dict(by_epoch=True)

custom_imports = dict(
imports=['mmselfsup.evaluation.functional.res_layer_extra_norm'],
allow_failed_imports=False)

"

However the training process stuck at the epoch1 all the time , and the traning epoch couldn't run into the next epoch as below.

The log file is like "mmengine - INFO - Epoch(train) [1][2400/824]", which is "24000" is already over "824". The training process should have went to the "Epoch(train) [2]".
I tried to check the config file, but I couldn't find what caused the error.
May I get some advice? Thanks in advanced.

HAOCHENYE · 2022-11-09T06:13:57Z

HAOCHENYE
Nov 9, 2022

It seems you do not configure train_cfg correctly:

train_cfg = dict(delete=True, type='EpochBasedTrainLoop', max_epochs=24, val_interval=4)

delete=True should be replaced with _delete_=True

2 replies

Yoooss Nov 9, 2022
Author

Thanks for your reply. However the config I used is exactly

" just because the underline makes my quote incorrectly shows.

I still don't know what caused the problem. May I get some help?

fangyixiao18 Nov 9, 2022
Maintainer

Thanks for your reply. However the config I used is exactly " just because the underline makes my quote incorrectly shows.

I still don't know what caused the problem. May I get some help?

hello, maybe you need to modify the sampler here:
sampler=dict(type='InfiniteSampler', shuffle=True),

Change InfiniteSampler to DefaultSampler

you could try it again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mmengine the training process didn't run correctly #576

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

mmengine the training process didn't run correctly #576

Uh oh!

Uh oh!

Yoooss Nov 9, 2022

Replies: 1 comment · 2 replies

Uh oh!

HAOCHENYE Nov 9, 2022

Uh oh!

Uh oh!

Yoooss Nov 9, 2022 Author

Uh oh!

fangyixiao18 Nov 9, 2022 Maintainer

Yoooss
Nov 9, 2022

Replies: 1 comment 2 replies

HAOCHENYE
Nov 9, 2022

Yoooss Nov 9, 2022
Author

fangyixiao18 Nov 9, 2022
Maintainer