Skip to content

[BUG] GPT2BPETokenizer (and possibly others) missing decode and offsets methods #1633

Open
@jzzhuang

Description

@jzzhuang

I've encountered an issue when running text generation with Megatron-LM. Apologies in advance if there are any mistakes — I'm a new user.

After successfully preprocessing the data using:

python tools/preprocess_data.py --tokenizer-type GPT2BPETokenizer

and completing pretraining, I tried running text generation using:

tools/run_text_generation_server.py

However, I received the following errors:

AttributeError: '_GPT2BPETokenizer' object has no attribute 'decode'
AttributeError: '_GPT2BPETokenizer' object has no attribute 'offsets'

These errors seem to originate from:

It seems like _GPT2BPETokenizer may be missing decode and offsets methods required by the inference script. Any guidance on how to resolve this would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions