RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) #89

mcpaulgeorge · 2024-08-05T09:04:35Z

The server is three A10s(24 G), I didn't add --multigpu.

mcpaulgeorge · 2024-08-05T09:20:28Z

shubhra · 2024-08-13T20:16:21Z

HItting the same issue with --multigpu and even without it

SSshuishui · 2024-09-03T02:33:36Z

Hi, there:
I changed 'LMClass.py' with self.model = AutoModelForCausalLM.from_pretrained(args.model, config=config, device_map='auto',torch_dtype=torch.float16) and self._device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") fixed this problem.
Then change cos, sin = self.rotary_emb(value_states, position_ids=position_ids) query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin) in 'models/int_llama_layer.py'
Finally change cache = {"i": 0, "attention_mask": None} and

class Catcher(nn.Module):
        def __init__(self, module):
            super().__init__()
            self.module = module
            self.is_llama = False
        def forward(self, inp, **kwargs):
            inps[cache["i"]] = inp
            cache["i"] += 1
            # cache["attention_mask"] = kwargs["attention_mask"]
            if self.is_llama:
                cache["position_ids"] = kwargs["position_ids"]
            raise ValueError`

in quantize/omniquant.py
Hope it can be helpful to you.
My 'transformers' is 4.44.2

SSshuishui · 2024-09-11T13:21:27Z

Hi, there: I changed 'LMClass.py' with self.model = AutoModelForCausalLM.from_pretrained(args.model, config=config, device_map='auto',torch_dtype=torch.float16) and self._device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") fixed this problem. Then change cos, sin = self.rotary_emb(value_states, position_ids=position_ids) query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin) in 'models/int_llama_layer.py' Finally change cache = {"i": 0, "attention_mask": None} and
class Catcher(nn.Module):
        def __init__(self, module):
            super().__init__()
            self.module = module
            self.is_llama = False
        def forward(self, inp, **kwargs):
            inps[cache["i"]] = inp
            cache["i"] += 1
            # cache["attention_mask"] = kwargs["attention_mask"]
            if self.is_llama:
                cache["position_ids"] = kwargs["position_ids"]
            raise ValueError` 
in quantize/omniquant.py Hope it can be helpful to you. My 'transformers' is 4.44.2

Not use --multigpu, and change with:

hf_device_map = model.hf_device_map
print(hf_device_map)

for i in range(len(layers)):
    logger.info(f"=== Start quantize layer {i} ===")
    print(f'================={i}==================')
    hf_device = f"cuda:{hf_device_map[f'{layer_name_prefix}.{i}']}"
    layer = layers[i].to(hf_device)
    inps = inps.to(hf_device)
    position_ids = position_ids.to(hf_device)

if don't set # cache["attention_mask"] = kwargs["attention_mask"], has error ValueError: Attention mask should be of size (1, 1, 2048, 2048), but is torch.Size([1, 1, 2048, 2049])

forcekkk · 2025-04-11T04:27:37Z

I want to use 8 4090(24G) to quantize the W4A4 for llama-7b, but it will have this error.

python main.py \
--model ./llama-7b
--epochs 1 --output_dir ./log/llama-7b-w4a4
--eval_ppl --wbits 4 --abits 4 --lwc --let --multigpu
--tasks piqa,arc_easy,arc_challenge,boolq,hellaswag,winogrande

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) #89

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) #89

mcpaulgeorge commented Aug 5, 2024

mcpaulgeorge commented Aug 5, 2024

Uh oh!

shubhra commented Aug 13, 2024

Uh oh!

SSshuishui commented Sep 3, 2024 •

edited

Loading

Uh oh!

SSshuishui commented Sep 11, 2024

Uh oh!

forcekkk commented Apr 11, 2025

Uh oh!

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) #89

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm) #89

Comments

mcpaulgeorge commented Aug 5, 2024

mcpaulgeorge commented Aug 5, 2024

Uh oh!

shubhra commented Aug 13, 2024

Uh oh!

SSshuishui commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SSshuishui commented Sep 11, 2024

Uh oh!

forcekkk commented Apr 11, 2025

Uh oh!

SSshuishui commented Sep 3, 2024 •

edited

Loading