Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix_bug simplify.py #1113

Merged
merged 2 commits into from
Jan 14, 2023
Merged

fix_bug simplify.py #1113

merged 2 commits into from
Jan 14, 2023

Conversation

Vibsteamer
Copy link
Collaborator

@Vibsteamer Vibsteamer commented Jan 13, 2023

expected behavior:
when "labeled":true in dpgen simplify, 02.fp will soft-link "labeled data", and the soft-linked "task dir" will also be created, for format consistency.

it is expected to be data.000 and task.000.000000,
being respectively guaranteed by funcs data_system_fmt and fp_task_fmt

bug:
the typo_bug used data_system_fmt for the "task dir" instead of fp_task_fmt,
then gives task.000 instead of task.000.000000,

which makes _check_empty_iter (who checks glob.glog("task.000.*")) in generator/run.py sentence this iter empty,
then 00.train of the next iter is always skipped.

consequence:
this make the "simplify_labeled" process never starts correctly,

no iter0 model presents and randomly-picked data in iter0 are never trained,
then iter1 gives error that can't finding the graph file from iter0 when trying copying them due to the train-skip.

BTW
thought "simplify_labeled" valuable in some complex or big-data scenarios but seems not loved by users yet.
pity : (

Signed-off-by: Wanrun Jiang [email protected]

when `"labeled":true` in `dpgen simplify`, 02.fp will soft-link labeled `data`,
and soft-linked `task` dir will also be created, for format consistency.

it is expected to be `data.000` and `task.000.000000`,
being respectively guaranteed by funcs `fp_task_fmt` and `data_system_fmt`

the typo_bug used `data_system_fmt` at both place and give `data.000` and `task.000',
which makes `_check_empty_iter` (who checks glob.glog("task.000.*")) in `generator/run.py` sentence this iter empty,
then `00.train` of the next iter is always skipped

this make the "simplify_labeled" process never starts correctly, cause no iter0 model presents and randomly-picked data in iter0 are never trained, 
then iter1 gives error that can't find graph file from iter0 when trying copying them due to the train-skip.

thought "simplify_labeled" valuable particularly in some complex or big-data scenario but seems not yet loved by users

Signed-off-by: Wanrun Jiang <[email protected]>
sorry, now it's right
@njzjz
Copy link
Member

njzjz commented Jan 13, 2023

def _check_empty_iter(iter_index, max_v = 0) :
fp_path = os.path.join(make_iter_name(iter_index), fp_name)
# check the number of collected data
sys_data = glob.glob(os.path.join(fp_path, "data.*"))

_check_empty_iter only checks data.*, so it should not be a problem.

@Vibsteamer
Copy link
Collaborator Author

def _check_empty_iter(iter_index, max_v = 0) :
fp_path = os.path.join(make_iter_name(iter_index), fp_name)
# check the number of collected data
sys_data = glob.glob(os.path.join(fp_path, "data.*"))

_check_empty_iter only checks data.*, so it should not be a problem.

OK, I will do the upgrade. Thanks.

@wanghan-iapcm wanghan-iapcm merged commit b14063e into deepmodeling:devel Jan 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants