Skip to content

RuntimeError: Job 634fdaf9-f361-4482-96bd-a525e834a435 failed for more than 3 times #522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anrushan opened this issue Aug 26, 2021 · 10 comments

Comments

@anrushan
Copy link

When i run the dpgen,i meet a error:RuntimeError: Job 634fdaf9-f361-4482-96bd-a525e834a435 failed for more than 3 times
The dpgen.log :
2021-07-23 18:29:49,654 - INFO : start running
2021-07-23 18:30:02,702 - INFO : start running
2021-07-23 18:30:48,067 - INFO : start running
2021-07-23 18:31:02,104 - INFO : start running
2021-07-23 18:33:50,590 - INFO : start running
2021-07-24 10:00:13,923 - INFO : start running
2021-07-24 17:19:03,753 - INFO : start running
2021-07-24 17:20:19,307 - INFO : start running
2021-07-24 17:20:19,308 - INFO : =============================iter.000000==============================
2021-07-24 17:20:19,308 - INFO : -------------------------iter.000000 task 00--------------------------
2021-07-24 17:20:45,372 - INFO : start running
2021-07-24 17:20:45,372 - INFO : =============================iter.000000==============================
2021-07-24 17:20:45,372 - INFO : -------------------------iter.000000 task 00--------------------------
2021-07-24 17:34:05,879 - INFO : start running
2021-07-24 17:34:05,879 - INFO : =============================iter.000000==============================
2021-07-24 17:34:05,879 - INFO : -------------------------iter.000000 task 00--------------------------
2021-07-24 17:45:32,958 - INFO : start running
2021-07-24 17:45:32,959 - INFO : =============================iter.000000==============================
2021-07-24 17:45:32,959 - INFO : -------------------------iter.000000 task 00--------------------------
2021-07-24 17:46:55,439 - INFO : start running
2021-07-24 17:46:55,440 - INFO : =============================iter.000000==============================
2021-07-24 17:46:55,440 - INFO : -------------------------iter.000000 task 00--------------------------
2021-07-24 17:47:47,988 - INFO : start running
2021-07-24 17:47:47,989 - INFO : =============================iter.000000==============================
2021-07-24 17:47:47,989 - INFO : -------------------------iter.000000 task 00--------------------------
2021-07-24 17:48:44,760 - INFO : start running
2021-07-24 17:48:44,760 - INFO : =============================iter.000000==============================
2021-07-24 17:48:44,760 - INFO : -------------------------iter.000000 task 00--------------------------
2021-07-24 17:52:36,699 - INFO : start running
2021-07-24 17:52:36,700 - INFO : =============================iter.000000==============================
2021-07-24 17:52:36,700 - INFO : -------------------------iter.000000 task 00--------------------------
2021-07-24 17:54:57,753 - INFO : start running
2021-07-24 17:54:57,754 - INFO : =============================iter.000000==============================
2021-07-24 17:54:57,754 - INFO : -------------------------iter.000000 task 00--------------------------
2021-07-24 22:00:51,771 - INFO : start running
2021-07-24 22:01:35,038 - INFO : start running
2021-07-24 22:01:55,319 - INFO : start running
2021-07-24 22:01:55,321 - INFO : =============================iter.000000==============================
2021-07-24 22:01:55,321 - INFO : -------------------------iter.000000 task 00--------------------------
2021-07-24 22:01:55,340 - INFO : -------------------------iter.000000 task 01--------------------------
2021-07-24 22:01:55,360 - INFO : new submission of 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-07-24 22:01:55,377 - INFO : new submission of f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-07-24 22:01:55,401 - INFO : new submission of b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-07-24 22:01:55,415 - INFO : new submission of 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-07-24 22:02:56,008 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 terminated, submit again
2021-07-24 22:02:56,220 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 terminated, submit again
2021-07-24 22:02:56,424 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 terminated, submit again
2021-07-24 22:02:56,629 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a terminated, submit again
2021-07-24 22:03:56,909 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 terminated, submit again
2021-07-24 22:03:57,141 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 terminated, submit again
2021-07-24 22:03:57,312 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 terminated, submit again
2021-07-24 22:03:57,603 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a terminated, submit again
2021-07-24 22:04:57,875 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 terminated, submit again
2021-07-24 22:04:58,089 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 terminated, submit again
2021-07-24 22:04:58,298 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 terminated, submit again
2021-07-24 22:04:58,438 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a terminated, submit again
2021-07-26 15:42:52,266 - INFO : start running
2021-07-26 15:42:52,267 - INFO : continue from iter 000 task 00
2021-07-26 15:42:52,267 - INFO : =============================iter.000000==============================
2021-07-26 15:42:52,267 - INFO : -------------------------iter.000000 task 01--------------------------
2021-07-26 15:42:52,395 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-07-26 15:42:52,471 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-07-26 15:42:52,557 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-07-26 15:42:52,681 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-07-26 15:44:40,477 - INFO : start running
2021-07-26 15:44:40,478 - INFO : continue from iter 000 task 00
2021-07-26 15:44:40,478 - INFO : =============================iter.000000==============================
2021-07-26 15:44:40,478 - INFO : -------------------------iter.000000 task 01--------------------------
2021-07-26 15:44:40,603 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-07-26 15:44:40,728 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-07-26 15:44:40,843 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-07-26 15:44:40,980 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-07-26 15:47:37,001 - INFO : start running
2021-07-26 15:47:37,001 - INFO : continue from iter 000 task 00
2021-07-26 15:47:37,001 - INFO : =============================iter.000000==============================
2021-07-26 15:47:37,001 - INFO : -------------------------iter.000000 task 01--------------------------
2021-07-26 15:47:37,111 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-07-26 15:47:37,221 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-07-26 15:47:37,346 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-07-26 15:47:37,445 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-08-26 15:19:47,191 - INFO : start running
2021-08-26 15:21:24,977 - INFO : start running
2021-08-26 15:22:22,593 - INFO : start running
2021-08-26 15:22:49,760 - INFO : start running
2021-08-26 15:23:56,044 - INFO : start running
2021-08-26 15:24:13,861 - INFO : start running
2021-08-26 15:24:13,862 - INFO : continue from iter 000 task 00
2021-08-26 15:24:13,862 - INFO : =============================iter.000000==============================
2021-08-26 15:24:13,862 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 15:24:13,980 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-08-26 15:24:14,067 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-08-26 15:24:14,174 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-08-26 15:24:14,303 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-08-26 15:26:32,980 - INFO : start running
2021-08-26 15:26:32,980 - INFO : continue from iter 000 task 00
2021-08-26 15:26:32,980 - INFO : =============================iter.000000==============================
2021-08-26 15:26:32,980 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 15:27:56,565 - INFO : start running
2021-08-26 15:27:56,566 - INFO : continue from iter 000 task 00
2021-08-26 15:27:56,566 - INFO : =============================iter.000000==============================
2021-08-26 15:27:56,566 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 15:35:21,767 - INFO : start running
2021-08-26 15:35:21,767 - INFO : continue from iter 000 task 00
2021-08-26 15:35:21,767 - INFO : =============================iter.000000==============================
2021-08-26 15:35:21,767 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 15:52:25,332 - INFO : start running
2021-08-26 15:52:25,333 - INFO : continue from iter 000 task 00
2021-08-26 15:52:25,333 - INFO : =============================iter.000000==============================
2021-08-26 15:52:25,333 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 15:52:25,334 - INFO : cannot find key "batch" in machine file, try to use deprecated key "machine_type"
2021-08-26 15:52:58,877 - INFO : start running
2021-08-26 15:52:58,877 - INFO : continue from iter 000 task 00
2021-08-26 15:52:58,877 - INFO : =============================iter.000000==============================
2021-08-26 15:52:58,877 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 15:52:58,878 - INFO : cannot find key "batch" in machine file, try to use deprecated key "machine_type"
2021-08-26 15:53:53,236 - INFO : start running
2021-08-26 15:53:53,237 - INFO : continue from iter 000 task 00
2021-08-26 15:53:53,237 - INFO : =============================iter.000000==============================
2021-08-26 15:53:53,237 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 15:53:53,359 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-08-26 15:53:53,447 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-08-26 15:53:53,580 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-08-26 15:53:53,688 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-08-26 15:55:55,213 - INFO : start running
2021-08-26 15:55:55,214 - INFO : continue from iter 000 task 00
2021-08-26 15:55:55,214 - INFO : =============================iter.000000==============================
2021-08-26 15:55:55,214 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 15:58:15,563 - INFO : start running
2021-08-26 15:58:15,564 - INFO : continue from iter 000 task 00
2021-08-26 15:58:15,564 - INFO : =============================iter.000000==============================
2021-08-26 15:58:15,564 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 16:09:54,493 - INFO : start running
2021-08-26 16:10:23,189 - INFO : start running
2021-08-26 16:11:30,986 - INFO : start running
2021-08-26 16:12:13,422 - INFO : start running
2021-08-26 16:12:37,819 - INFO : start running
2021-08-26 16:12:43,941 - INFO : start running
2021-08-26 16:12:43,941 - INFO : continue from iter 000 task 00
2021-08-26 16:12:43,941 - INFO : =============================iter.000000==============================
2021-08-26 16:12:43,941 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 16:15:20,562 - INFO : start running
2021-08-26 16:15:20,563 - INFO : continue from iter 000 task 00
2021-08-26 16:15:20,563 - INFO : =============================iter.000000==============================
2021-08-26 16:15:20,563 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 16:15:31,998 - INFO : start running
2021-08-26 16:15:31,998 - INFO : continue from iter 000 task 00
2021-08-26 16:15:31,998 - INFO : =============================iter.000000==============================
2021-08-26 16:15:31,998 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 16:18:01,501 - INFO : start running
2021-08-26 16:18:01,502 - INFO : continue from iter 000 task 00
2021-08-26 16:18:01,502 - INFO : =============================iter.000000==============================
2021-08-26 16:18:01,502 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 16:18:34,938 - INFO : start running
2021-08-26 16:18:51,220 - INFO : start running
2021-08-26 16:18:51,220 - INFO : continue from iter 000 task 00
2021-08-26 16:18:51,221 - INFO : =============================iter.000000==============================
2021-08-26 16:18:51,221 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 16:19:37,939 - INFO : start running
2021-08-26 16:19:37,940 - INFO : continue from iter 000 task 00
2021-08-26 16:19:37,940 - INFO : =============================iter.000000==============================
2021-08-26 16:19:37,941 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 16:19:38,037 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-08-26 16:19:38,122 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-08-26 16:19:38,264 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-08-26 16:19:38,388 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-08-26 16:30:27,385 - INFO : start running
2021-08-26 16:30:27,386 - INFO : continue from iter 000 task 00
2021-08-26 16:30:27,386 - INFO : =============================iter.000000==============================
2021-08-26 16:30:27,386 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 16:30:27,479 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-08-26 16:30:27,603 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-08-26 16:30:27,739 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-08-26 16:30:27,872 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-08-26 16:33:27,441 - INFO : start running
2021-08-26 16:33:27,441 - INFO : continue from iter 000 task 00
2021-08-26 16:33:27,441 - INFO : =============================iter.000000==============================
2021-08-26 16:33:27,442 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 16:33:27,528 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-08-26 16:33:27,606 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-08-26 16:33:27,737 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-08-26 16:33:27,817 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-08-26 16:37:12,955 - INFO : start running
2021-08-26 16:37:12,955 - INFO : continue from iter 000 task 00
2021-08-26 16:37:12,955 - INFO : =============================iter.000000==============================
2021-08-26 16:37:12,955 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 16:40:04,885 - INFO : start running
2021-08-26 16:40:04,886 - INFO : continue from iter 000 task 00
2021-08-26 16:40:04,886 - INFO : =============================iter.000000==============================
2021-08-26 16:40:04,886 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 16:40:05,016 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-08-26 16:40:05,139 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-08-26 16:40:05,244 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-08-26 16:40:05,330 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-08-26 17:16:30,001 - INFO : start running
2021-08-26 17:16:30,002 - INFO : continue from iter 000 task 00
2021-08-26 17:16:30,002 - INFO : =============================iter.000000==============================
2021-08-26 17:16:30,002 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 17:16:30,088 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-08-26 17:16:30,169 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-08-26 17:16:30,290 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-08-26 17:16:30,375 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-08-26 17:20:23,491 - INFO : start running
2021-08-26 17:20:23,491 - INFO : continue from iter 000 task 00
2021-08-26 17:20:23,491 - INFO : =============================iter.000000==============================
2021-08-26 17:20:23,491 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 17:20:23,591 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-08-26 17:20:23,690 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-08-26 17:20:23,771 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-08-26 17:20:23,858 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-08-26 17:23:07,873 - INFO : start running
2021-08-26 17:23:07,873 - INFO : continue from iter 000 task 00
2021-08-26 17:23:07,873 - INFO : =============================iter.000000==============================
2021-08-26 17:23:07,874 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 17:23:07,992 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-08-26 17:23:08,125 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-08-26 17:23:08,266 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-08-26 17:23:08,406 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-08-26 17:33:49,911 - INFO : start running
2021-08-26 17:33:49,912 - INFO : continue from iter 000 task 00
2021-08-26 17:33:49,912 - INFO : =============================iter.000000==============================
2021-08-26 17:33:49,912 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 17:33:50,008 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-08-26 17:33:50,088 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-08-26 17:33:50,257 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-08-26 17:33:50,372 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333
2021-08-26 17:34:11,435 - INFO : start running
2021-08-26 17:34:11,436 - INFO : continue from iter 000 task 00
2021-08-26 17:34:11,436 - INFO : =============================iter.000000==============================
2021-08-26 17:34:11,436 - INFO : -------------------------iter.000000 task 01--------------------------
2021-08-26 17:34:11,518 - INFO : restart from old submission 634fdaf9-f361-4482-96bd-a525e834a435 for chunk 8aefb06c426e07a0a671a1e2488b4858d694a730
2021-08-26 17:34:11,621 - INFO : restart from old submission f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 for chunk e193a01ecf8d30ad0affefd332ce934e32ffce72
2021-08-26 17:34:11,714 - INFO : restart from old submission b524808f-4e9d-4773-a60a-8031c827fc40 for chunk 6fc978af728d43c59faa400d5f6e0471ac850d4c
2021-08-26 17:34:11,872 - INFO : restart from old submission 43f9a4bc-9999-4ac4-8dcb-56352480a58a for chunk 221407c03ae5c73109cce71d27e24637824f3333

The machine.json is:
{
"_comment": "training on localhost ",
"_comment" : "This is for DeePMD-kit 1.*",
"train_command" : "/home/jiangjun/miniconda3/bin/dp",
"train_machine": {
"batch": "shell",
"work_path" : "/home/jiangjun/cptwokile/1/c1/data"
},
"train_resources": {
"envs": {
}
},

"_comment":		"model_devi on localhost ",
"lmp_command":	"/usr/bin/lmp_mpi",
"model_devi_group_size": 5,
"model_devi_machine":	{
"batch":	"shell",
"_comment" : "If lazy_local is true, calculations are done directly in current folders.",
"lazy_local" : true
},	
"model_devi_resources":	{
},    

"_comment":		"fp on localhost ",
"fp_command":	"/home/jiangjun/cptwokile/cp2k-8.1/exe/local/cp2k.ssmp",
"fp_group_size":	2,
"fp_machine":	{
"batch":	"shell",
"work_path" :	"/home/jiangjun/cptwokile/1/c1/data",
"_comment" :	"that's all"
},	
"fp_resources":	{
"module_list":  ["mpi"],
"task_per_node":4,
"with_mpi":	true,
"_comment":	"that's all"
},

"_comment":		" that's all "

}
If i need do something for batch type shell?My mechine is Local single server without lsf and so on .

@zhao-w-en
Copy link

Maybe error will be print to train.log...I guess...
I'm facing similar error and trying to debug...
qaq

@Ericwang6
Copy link
Member

It seems happen during model training, please check train.log for specific error infos in your work path /home/jiangjun/cptwokile/1/c1/data/

@anrushan
Copy link
Author

It seems happen during model training, please check train.log for specific error infos in your work path /home/jiangjun/cptwokile/1/c1/data/
Thanks for your reply,I check the train.log ,the error is
Traceback (most recent call last):
File "/home/jiangjun/miniconda3/bin/dp", line 10, in
sys.exit(main())
File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/main.py", line 73, in main
train(args)
File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 87, in train
_do_work(jdata, run_opt)
File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/train.py", line 140, in _do_work
model.build (data, stop_batch)
File "/home/jiangjun/miniconda3/lib/python3.8/site-packages/deepmd/Trainer.py", line 215, in build
assert (self.ntypes >= data.get_ntypes()), "ntypes should match that found in data"
AssertionError: ntypes should match that found in data
I don't understand the meaning of this prompt. I hope I can get your help,And thanks for your reply again.

@Ericwang6
Copy link
Member

Make sure that the number of atom types in your training data (you can check it in type_map.raw) is smaller than the length of type_map parameter in the param.json.

@anrushan
Copy link
Author

Make sure that the number of atom types in your training data (you can check it in type_map.raw) is smaller than the length of type_map parameter in the param.json.

The sel is not match with the types,add the number of each type the Thanks for your reply!But,i have met a new error :2021-08-28 21:29:19,591 - INFO : job 634fdaf9-f361-4482-96bd-a525e834a435 finished
2021-08-28 21:29:19,597 - INFO : job f3a31a1e-63f4-4591-84d0-e5e936b3cbe4 finished
2021-08-28 21:29:19,602 - INFO : job b524808f-4e9d-4773-a60a-8031c827fc40 finished
2021-08-28 21:29:19,607 - INFO : job 43f9a4bc-9999-4ac4-8dcb-56352480a58a finished
2021-08-28 21:29:19,613 - INFO : -------------------------iter.000000 task 02--------------------------
2021-08-28 21:29:19,614 - INFO : -------------------------iter.000000 task 03--------------------------
2021-08-28 21:29:19,648 - INFO : -------------------------iter.000000 task 04--------------------------
2021-08-28 21:29:19,649 - INFO : Dispatcher switches to the lazy local mode
2021-08-28 21:29:19,666 - INFO : new submission of 4c01fc60-c9b8-444e-8d5a-635d3f592fac for chunk 2a332aeea901b380d1755cc375d3fcb7e993ecae
2021-08-28 21:30:19,944 - INFO : job 4c01fc60-c9b8-444e-8d5a-635d3f592fac terminated, submit again
2021-08-28 21:31:20,209 - INFO : job 4c01fc60-c9b8-444e-8d5a-635d3f592fac terminated, submit again
2021-08-28 21:32:20,459 - INFO : job 4c01fc60-c9b8-444e-8d5a-635d3f592fac terminated, submit again
2021-08-28 21:34:15,982 - INFO : start running
2021-08-28 21:34:15,983 - INFO : continue from iter 000 task 03
2021-08-28 21:34:15,983 - INFO : =============================iter.000000==============================
2021-08-28 21:34:15,983 - INFO : -------------------------iter.000000 task 04--------------------------
2021-08-28 21:34:15,984 - INFO : Dispatcher switches to the lazy local mode
2021-08-28 21:34:16,075 - INFO : restart from old submission 4c01fc60-c9b8-444e-8d5a-635d3f592fac for chunk 2a332aeea901b380d1755cc375d3fcb7e993ecae

May I ask you how to solve this problem,thanks for your reply!

@anrushan
Copy link
Author

ERROR: Unknown pair style deepmd (../force.cpp:262)
Last command: pair_style deepmd ../graph.000.pb ../graph.001.pb ../graph.002.pb ../graph.003.pb out_freq ${THERMO_FREQ} out_file model_devi.out
ERROR: Unknown pair style deepmd (../force.cpp:262) appear in the lammps log

@AnguseZhang
Copy link
Collaborator

ERROR: Unknown pair style deepmd (../force.cpp:262)
Last command: pair_style deepmd ../graph.000.pb ../graph.001.pb ../graph.002.pb ../graph.003.pb out_freq ${THERMO_FREQ} out_file model_devi.out
ERROR: Unknown pair style deepmd (../force.cpp:262) appear in the lammps log

Can you make sure that you installed correct version of Lammps compatible with DeePMD-kit ?

@ChesterCs-thu
Copy link

I am facing the same problem, I checked that the type_map.raw file and the type_map in paramter.json file. But the problem was not solved.
In addtion, I checked the train.log file in work path, and it shows the resule below.
/var/spool/torque/mom_priv/jobs/134083.admin.SC: line 15: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long
/var/spool/torque/mom_priv/jobs/134083.admin.SC: line 29: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long

@HuangJiameng
Copy link
Collaborator

I am facing the same problem, I checked that the type_map.raw file and the type_map in paramter.json file. But the problem was not solved. In addtion, I checked the train.log file in work path, and it shows the resule below. /var/spool/torque/mom_priv/jobs/134083.admin.SC: line 15: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long /var/spool/torque/mom_priv/jobs/134083.admin.SC: line 29: /public/home/liyue/deepmd-kit/bin/dp: Argument list too long

Please provide your input files to help locate and reproduce the problem.

AnguseZhang pushed a commit that referenced this issue Dec 31, 2022
Remove the old dispatcher and all related tests and examples.
Fix #1002. Close #522.

Signed-off-by: Jinzhe Zeng <[email protected]>
@njzjz
Copy link
Member

njzjz commented Jan 1, 2023

Old dispatcher has been removed from DP-GEN. Please use the new dpdispatcher instead.

@njzjz njzjz closed this as not planned Won't fix, can't repro, duplicate, stale Jan 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants