-
Notifications
You must be signed in to change notification settings - Fork 374
qwen2vl-7b推理报错 #1027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
遇到了同样的问题,transformers 回退到 4.49.0 就不会出现这个问题了,但是基于新版 transformers== 4.53.0 训练的 Qwen 模型 在这个版本 load ckpt 会报错:
|
感谢,best wish |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
2025-05-29T10:35:52.581904556Z [2025-05-29 10:35:52] ERROR - RUN - run.py: main - 493: Model Qwen2-VL-7B-Instruct x Dataset CCBench combination failed: Sharding propagation failed on op Op(op=aten.convolution.default, args_schema=Spec(R on (5280, 3, 2, 14, 14)), Spec(R on (1280, 3, 2, 14, 14)), None, [2, 14, 14], [0, 0, 0], [1, 1, 1], False, [0, 0, 0], 1 @ mesh: (1,)).
2025-05-29T10:35:52.581950110Z Error: , skipping this combination.
2025-05-29T10:35:52.581960830Z Traceback (most recent call last):
2025-05-29T10:35:52.581968070Z File "/usr/local/lib/python3.12/site-packages/torch/distributed/tensor/_sharding_prop.py", line 447, in propagate_op_sharding_non_cached
2025-05-29T10:35:52.581993098Z output_sharding = sharding_prop_func(op_schema)
2025-05-29T10:35:52.582000501Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582007051Z File "/usr/local/lib/python3.12/site-packages/torch/distributed/tensor/_ops/_conv_ops.py", line 29, in convolution_rules
2025-05-29T10:35:52.582015164Z assert isinstance(bias_spec, DTensorSpec)
2025-05-29T10:35:52.582021801Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582028139Z AssertionError
2025-05-29T10:35:52.582034299Z
2025-05-29T10:35:52.582040351Z The above exception was the direct cause of the following exception:
2025-05-29T10:35:52.582047062Z
2025-05-29T10:35:52.582053311Z Traceback (most recent call last):
2025-05-29T10:35:52.582059376Z File "/mnt/largeml-train-gui-agent/hankaiyang/packages/VLMEvalKit/run.py", line 365, in main
2025-05-29T10:35:52.582066806Z model = infer_data_job(
2025-05-29T10:35:52.582073019Z ^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582078874Z File "/mnt/largeml-train-gui-agent/hankaiyang/packages/VLMEvalKit/vlmeval/inference.py", line 185, in infer_data_job
2025-05-29T10:35:52.582086449Z model = infer_data(
2025-05-29T10:35:52.582092354Z ^^^^^^^^^^^
2025-05-29T10:35:52.582098506Z File "/mnt/largeml-train-gui-agent/hankaiyang/packages/VLMEvalKit/vlmeval/inference.py", line 148, in infer_data
2025-05-29T10:35:52.582105438Z response = model.generate(message=struct, dataset=dataset_name)
2025-05-29T10:35:52.582111729Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582133364Z File "/mnt/largeml-train-gui-agent/hankaiyang/packages/VLMEvalKit/vlmeval/vlm/base.py", line 116, in generate
2025-05-29T10:35:52.582143031Z return self.generate_inner(message, dataset)
2025-05-29T10:35:52.582150957Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582158751Z File "/mnt/largeml-train-gui-agent/hankaiyang/packages/VLMEvalKit/vlmeval/vlm/qwen2_vl/model.py", line 634, in generate_inner
2025-05-29T10:35:52.582167762Z return self.generate_inner_transformers(message, dataset=dataset)
2025-05-29T10:35:52.582175659Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582184569Z File "/mnt/largeml-train-gui-agent/hankaiyang/packages/VLMEvalKit/vlmeval/vlm/qwen2_vl/model.py", line 484, in generate_inner_transformers
2025-05-29T10:35:52.582192046Z generated_ids = self.model.generate(
2025-05-29T10:35:52.582198782Z ^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582205086Z File "/usr/local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
2025-05-29T10:35:52.582211780Z return func(*args, **kwargs)
2025-05-29T10:35:52.582218329Z ^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582231992Z File "/usr/local/lib/python3.12/site-packages/transformers/generation/utils.py", line 2597, in generate
2025-05-29T10:35:52.582239082Z result = self._sample(
2025-05-29T10:35:52.582245257Z ^^^^^^^^^^^^^
2025-05-29T10:35:52.582251263Z File "/usr/local/lib/python3.12/site-packages/transformers/generation/utils.py", line 3557, in _sample
2025-05-29T10:35:52.582257900Z outputs = self(**model_inputs, return_dict=True)
2025-05-29T10:35:52.582264587Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582270832Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
2025-05-29T10:35:52.582277795Z return self._call_impl(*args, **kwargs)
2025-05-29T10:35:52.582283777Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582289650Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
2025-05-29T10:35:52.582296872Z return forward_call(*args, **kwargs)
2025-05-29T10:35:52.582302896Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582308875Z File "/usr/local/lib/python3.12/site-packages/transformers/utils/generic.py", line 969, in wrapper
2025-05-29T10:35:52.582315547Z output = func(self, *args, **kwargs)
2025-05-29T10:35:52.582321945Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582328282Z File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1781, in forward
2025-05-29T10:35:52.582335180Z outputs = self.model(
2025-05-29T10:35:52.582341242Z ^^^^^^^^^^^
2025-05-29T10:35:52.582347042Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
2025-05-29T10:35:52.582354085Z return self._call_impl(*args, **kwargs)
2025-05-29T10:35:52.582360195Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582366407Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
2025-05-29T10:35:52.582372893Z return forward_call(*args, **kwargs)
2025-05-29T10:35:52.582378665Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582384948Z File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1583, in forward
2025-05-29T10:35:52.582392650Z image_embeds = self.get_image_features(pixel_values, image_grid_thw)
2025-05-29T10:35:52.582399345Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582406118Z File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1539, in get_image_features
2025-05-29T10:35:52.582413235Z image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
2025-05-29T10:35:52.582419443Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582429678Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
2025-05-29T10:35:52.582436235Z return self._call_impl(*args, **kwargs)
2025-05-29T10:35:52.582442633Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582448741Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
2025-05-29T10:35:52.582455883Z return forward_call(*args, **kwargs)
2025-05-29T10:35:52.582461985Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582468505Z File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1014, in forward
2025-05-29T10:35:52.582475803Z hidden_states = self.patch_embed(hidden_states)
2025-05-29T10:35:52.582482216Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582488293Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
2025-05-29T10:35:52.582494960Z return self._call_impl(*args, **kwargs)
2025-05-29T10:35:52.582501028Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582506966Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
2025-05-29T10:35:52.582514104Z return forward_call(*args, **kwargs)
2025-05-29T10:35:52.582520439Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582526611Z File "/usr/local/lib/python3.12/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 268, in forward
2025-05-29T10:35:52.582533344Z hidden_states = self.proj(hidden_states.to(dtype=target_dtype)).view(-1, self.embed_dim)
2025-05-29T10:35:52.582539664Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582546043Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
2025-05-29T10:35:52.582552829Z return self._call_impl(*args, **kwargs)
2025-05-29T10:35:52.582559006Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582564999Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1857, in _call_impl
2025-05-29T10:35:52.582571851Z return inner()
2025-05-29T10:35:52.582577553Z ^^^^^^^
2025-05-29T10:35:52.582584633Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1805, in inner
2025-05-29T10:35:52.582591544Z result = forward_call(*args, **kwargs)
2025-05-29T10:35:52.582597534Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582603741Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 725, in forward
2025-05-29T10:35:52.582610448Z return self._conv_forward(input, self.weight, self.bias)
2025-05-29T10:35:52.582616574Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582635906Z File "/usr/local/lib/python3.12/site-packages/torch/nn/modules/conv.py", line 720, in _conv_forward
2025-05-29T10:35:52.582642834Z return F.conv3d(
2025-05-29T10:35:52.582648859Z ^^^^^^^^^
2025-05-29T10:35:52.582655226Z File "/usr/local/lib/python3.12/site-packages/torch/_compile.py", line 51, in inner
2025-05-29T10:35:52.582661859Z return disable_fn(*args, **kwargs)
2025-05-29T10:35:52.582668394Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582674451Z File "/usr/local/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 838, in _fn
2025-05-29T10:35:52.582681349Z return fn(*args, **kwargs)
2025-05-29T10:35:52.582687294Z ^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582693281Z File "/usr/local/lib/python3.12/site-packages/torch/distributed/tensor/_api.py", line 344, in torch_dispatch
2025-05-29T10:35:52.582700332Z return DTensor._op_dispatcher.dispatch(
2025-05-29T10:35:52.582706284Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582712794Z File "/usr/local/lib/python3.12/site-packages/torch/distributed/tensor/_dispatch.py", line 164, in dispatch
2025-05-29T10:35:52.582719582Z return self._custom_op_handlers[op_call](op_call, args, kwargs) # type: ignore[operator]
2025-05-29T10:35:52.582726154Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582732514Z File "/usr/local/lib/python3.12/site-packages/torch/distributed/tensor/_tp_conv.py", line 239, in convolution_handler
2025-05-29T10:35:52.582739418Z dtensor.DTensor._op_dispatcher.sharding_propagator.propagate(op_info)
2025-05-29T10:35:52.582745929Z File "/usr/local/lib/python3.12/site-packages/torch/distributed/tensor/_sharding_prop.py", line 264, in propagate
2025-05-29T10:35:52.582753041Z OutputSharding, self.propagate_op_sharding(op_info.schema)
2025-05-29T10:35:52.582759517Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582765600Z File "/usr/local/lib/python3.12/site-packages/torch/distributed/tensor/_sharding_prop.py", line 45, in call
2025-05-29T10:35:52.582772680Z return self.cache(*args, **kwargs)
2025-05-29T10:35:52.582778849Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-05-29T10:35:52.582785122Z File "/usr/local/lib/python3.12/site-packages/torch/distributed/tensor/_sharding_prop.py", line 451, in propagate_op_sharding_non_cached
2025-05-29T10:35:52.582793168Z raise RuntimeError(
2025-05-29T10:35:52.582799430Z RuntimeError: Sharding propagation failed on op Op(op=aten.convolution.default, args_schema=Spec(R on (5280, 3, 2, 14, 14)), Spec(R on (1280, 3, 2, 14, 14)), None, [2, 14, 14], [0, 0, 0], [1, 1, 1], False, [0, 0, 0], 1 @ mesh: (1,)).
2025-05-29T10:35:52.582807572Z Error:
2025-05-29T10:35:53.064460896Z [rank0]:[W529 10:35:53.735432849 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
改了config.py的
运行指令是
python3 run.py --data CCBench --model Qwen2-VL-7B-Instruct --verbose
The text was updated successfully, but these errors were encountered: