Skip to content

Add InternLM2 model and tests #29667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Conversation

x54-729
Copy link

@x54-729 x54-729 commented Mar 15, 2024

What does this PR do?

  • Add tokenizer, fast tokenizer, configuration and model of InternLM2
  • Add tests for InternLM2 tokenizer and model
  • Complete README and model doc of InternLM2 (InternLM2 technical report will be released next week and I will update paper link to this PR)

All the tests have passed locally except some tokenizer issues(#29617, #29626). These failed tests are skipped temporarily.

I'm not very sure that my code format and tests are proper enough to merge, if there are any changes neede,d please let me know and I will fix them asap!

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker

@vansin
Copy link
Contributor

vansin commented Mar 15, 2024

amazing!!!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks for porting this model, is the only different with the Llama arch the non splitted QKV? If so, you have the Persimmon architecture that should be compatible out of the box with weight renaming!
What about the tokenizer, pretty sure the first InternLM used LlamaTokenizer converted by @Rocketknight1 no?

@x54-729
Copy link
Author

x54-729 commented Mar 27, 2024

Hey! Thanks for porting this model, is the only different with the Llama arch the non splitted QKV? If so, you have the Persimmon architecture that should be compatible out of the box with weight renaming! What about the tokenizer, pretty sure the first InternLM used LlamaTokenizer converted by @Rocketknight1 no?

@ArthurZucker

Thanks for your reply!

  1. Yes, QKV of InternLM2 are merged. According to 2.2 Model Structure section of our tech report: https://arxiv.org/pdf/2403.17297.pdf, The arrangement of merged QKV can accelerate inference. Also the merged QKV is convenient for tensor parallism.

  2. Our tokenizer is different from llama since our add_dummy_prefix is False, so use llama tokenizer for internlm previously will cause some problems. Additionally, InternLM2 chat model uses some special tokens, so I have to use InternLM2Converter1 instead of LlamaConverter`.

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Mar 27, 2024

Regarding 1. There is a Llama like model with fused QKV in transformers. Perismmon! If it's not exact match, I recommend starting from this arch with copied from!
2. If these are the only difference, GemmaTokenizer is the closest. You probably don't even need that and can just use AutoTokenizer with a PreTrainedTokenizerFast just using the tokenizer.json. Something similar to the gemma tokenizer should do the trick? I can probably help you with that if you want? 🤗

@x54-729
Copy link
Author

x54-729 commented Mar 28, 2024

Regarding 1. There is a Llama like model with fused QKV in transformers. Perismmon! If it's not exact match, I recommend starting from this arch with copied from! 2. If these are the only difference, GemmaTokenizer is the closest. You probably don't even need that and can just use AutoTokenizer with a PreTrainedTokenizerFast just using the tokenizer.json. Something similar to the gemma tokenizer should do the trick? I can probably help you with that if you want? 🤗

These were what you recommended to do:

  1. Using copied from command to write our model based on Perismmon if internlm2's architecture is not excatly the same as perismmon.
  2. Upload tokenizer.json file to our model repositories to deal with add_dummy_prefix and special chat tokens, so there is no need to submit new tokenizer in this PR for internlm2.

Is that right?

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Mar 28, 2024

Yes! I

("falcon", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),

falcon only relies on a tokenizer.json ! If this does not work, then let's add InternLm2TokenizerFast, using copied from as well.

for 1. I can have another look to check what is the closest model to help, pretty sure it's persimmon (merge qkv) and phi (split qkv)

@x54-729
Copy link
Author

x54-729 commented Mar 29, 2024

Yes! I

("falcon", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),

falcon only relies on a tokenizer.json ! If this does not work, then let's add InternLm2TokenizerFast, using copied from as well.

for 1. I can have another look to check what is the closest model to help, pretty sure it's persimmon (merge qkv) and phi (split qkv)

Thanks! I will try tokenizer.json later !

For Perismmon model, it does not support GQA and the arrangement of qkv seems be different from internlm2.
On the other hand, names of state are not the same. Thus writing internlm2 based on LLaMA seems to be more convenient? I'm not pretty sure about this

@ArthurZucker
Copy link
Collaborator

Gemma or Phi should be the closest! @SunMarc will do the next round of review! Ping him if you need any help in the mean time! 🤗

@SunMarc
Copy link
Member

SunMarc commented Apr 17, 2024

Hi @x54-729, just checking out if you are still planning to finish this PR. Feel free to ask me any question =)

@huggingface huggingface deleted a comment from github-actions bot May 13, 2024
Copy link
Contributor

github-actions bot commented Jun 7, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Jun 15, 2024
@SunMarc SunMarc reopened this Jun 17, 2024
@github-actions github-actions bot closed this Jun 26, 2024
@SunMarc SunMarc reopened this Jun 26, 2024
@github-actions github-actions bot closed this Jul 5, 2024
@JettScythe
Copy link

I would like to bump / reopen this. I am trying to run a ray serve deployment with internlm2_5:

applications:
- args:
    llm_configs:
        - model_loading_config:
            model_id: internlm
            model_source: internlm/internlm2_5-7b-chat
            tokenizer_source: internlm/internlm2-7b-chat
          deployment_config:
            autoscaling_config:
                min_replicas: 1
                max_replicas: 2
          accelerator_type: T4
          engine_kwargs:
            trust_remote_code: True
          runtime_env:
            uv:
              - xgrammar==0.1.11
              - pynvml==12.0.0
            env_vars:
              HF_HUB_ENABLE_HF_TRANSFER: "1"
              TRANSFORMERS_VERBOSITY: "debug"
  import_path: ray.serve.llm:build_openai_app
  name: llm_app
  route_prefix: "/"

and keep running into issues that this exact PR would solve:

(ServeReplica:llm_app:LLMDeployment:internlm pid=4347, ip=10.128.0.43) Could not locate the tokenizer configuration file, will try to use the model config instead.
(ServeController pid=35631) ERROR 2025-05-01 17:20:55,549 controller 35631 -- Exception in Replica(id='f60my800', deployment='LLMDeployment:internlm', app='llm_app'), the replica will be stopped.
(ServeController pid=35631) Traceback (most recent call last):
(ServeController pid=35631)   File "/home/ubuntu/.venv/lib/python3.12/site-packages/ray/serve/_private/deployment_state.py", line 694, in check_ready
(ServeController pid=35631)     ) = ray.get(self._ready_obj_ref)
(ServeController pid=35631)         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/home/ubuntu/.venv/lib/python3.12/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
(ServeController pid=35631)     return fn(*args, **kwargs)
(ServeController pid=35631)            ^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/home/ubuntu/.venv/lib/python3.12/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
(ServeController pid=35631)     return func(*args, **kwargs)
(ServeController pid=35631)            ^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/home/ubuntu/.venv/lib/python3.12/site-packages/ray/_private/worker.py", line 2822, in get
(ServeController pid=35631)     values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(ServeController pid=35631)                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/home/ubuntu/.venv/lib/python3.12/site-packages/ray/_private/worker.py", line 930, in get_objects
(ServeController pid=35631)     raise value.as_instanceof_cause()
(ServeController pid=35631) ray.exceptions.RayTaskError(RuntimeError): ray::ServeReplica:llm_app:LLMDeployment:internlm.initialize_and_get_metadata() (pid=4347, ip=10.128.0.43, actor_id=2f584a0392e7e81ee1fcf7a602000000, repr=<ray.serve._private.replica.ServeReplica:llm_app:LLMDeployment:internlm object at 0x7fefe10c5700>)
(ServeController pid=35631)   File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py", line 449, in result
(ServeController pid=35631)     return self.__get_result()
(ServeController pid=35631)            ^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(ServeController pid=35631)     raise self._exception
(ServeController pid=35631)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/serve/_private/replica.py", line 980, in initialize_and_get_metadata
(ServeController pid=35631)     await self._replica_impl.initialize(deployment_config)
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/serve/_private/replica.py", line 709, in initialize
(ServeController pid=35631)     raise RuntimeError(traceback.format_exc()) from None
(ServeController pid=35631) RuntimeError: Traceback (most recent call last):
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/serve/_private/replica.py", line 686, in initialize
(ServeController pid=35631)     self._user_callable_asgi_app = await asyncio.wrap_future(
(ServeController pid=35631)                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/serve/_private/replica.py", line 1378, in initialize_callable
(ServeController pid=35631)     await self._call_func_or_gen(
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/serve/_private/replica.py", line 1341, in _call_func_or_gen
(ServeController pid=35631)     result = await result
(ServeController pid=35631)              ^^^^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/llm/_internal/serve/deployments/llm/llm_server.py", line 447, in __init__
(ServeController pid=35631)     await asyncio.wait_for(self._start_engine(), timeout=ENGINE_START_TIMEOUT_S)
(ServeController pid=35631)   File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
(ServeController pid=35631)     return await fut
(ServeController pid=35631)            ^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/llm/_internal/serve/deployments/llm/llm_server.py", line 492, in _start_engine
(ServeController pid=35631)     await self.engine.start()
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py", line 302, in start
(ServeController pid=35631)     self.engine = await self._start_engine()
(ServeController pid=35631)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py", line 328, in _start_engine
(ServeController pid=35631)     return await self._start_engine_v0()
(ServeController pid=35631)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py", line 421, in _start_engine_v0
(ServeController pid=35631)     ) = await self._prepare_engine_config(use_v1=False)
(ServeController pid=35631)         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py", line 344, in _prepare_engine_config
(ServeController pid=35631)     node_initialization = await self.initialize_node(self.llm_config)
(ServeController pid=35631)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/llm/_internal/serve/deployments/llm/vllm/vllm_engine.py", line 289, in initialize_node
(ServeController pid=35631)     return await initialize_node_util(llm_config)
(ServeController pid=35631)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/llm/_internal/serve/deployments/utils/node_initialization_utils.py", line 109, in initialize_node
(ServeController pid=35631)     await _initialize_local_node(
(ServeController pid=35631)   File "/home/ubuntu/.local/share/uv/python/cpython-3.12.8-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py", line 59, in run
(ServeController pid=35631)     result = self.fn(*self.args, **self.kwargs)
(ServeController pid=35631)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/ray/llm/_internal/serve/deployments/utils/node_initialization_utils.py", line 155, in _initialize_local_node
(ServeController pid=35631)     _ = transformers.AutoTokenizer.from_pretrained(
(ServeController pid=35631)         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 966, in from_pretrained
(ServeController pid=35631)     config = AutoConfig.from_pretrained(
(ServeController pid=35631)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(ServeController pid=35631)   File "/var/tmp/ray/session_2025-05-01_16-08-10_282944_17149/runtime_resources/uv/2aed8c6c0f2b9dbd288493872e9519be892cf423/virtualenv/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 1151, in from_pretrained
(ServeController pid=35631)     raise ValueError(
(ServeController pid=35631) ValueError: Unrecognized model in internlm/internlm2_5-7b-chat. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: albert, align, altclip, aria, aria_text, audio-spectrogram-transformer, autoformer, aya_vision, bamba, bark, bart, beit, bert, bert-generation, big_bird, bigbird_pegasus, biogpt, bit, blenderbot, blenderbot-small, blip, blip-2, bloom, bridgetower, bros, camembert, canine, chameleon, chinese_clip, chinese_clip_vision_model, clap, clip, clip_text_model, clip_vision_model, clipseg, clvp, code_llama, codegen, cohere, cohere2, colpali, conditional_detr, convbert, convnext, convnextv2, cpmant, ctrl, cvt, dab-detr, dac, data2vec-audio, data2vec-text, data2vec-vision, dbrx, deberta, deberta-v2, decision_transformer, deepseek_v3, deformable_detr, deit, depth_anything, depth_pro, deta, detr, diffllama, dinat, dinov2, dinov2_with_registers, distilbert, donut-swin, dpr, dpt, efficientformer, efficientnet, electra, emu3, encodec, encoder-decoder, ernie, ernie_m, esm, falcon, falcon_mamba, fastspeech2_conformer, flaubert, flava, fnet, focalnet, fsmt, funnel, fuyu, gemma, gemma2, gemma3, gemma3_text, git, glm, glm4, glpn, got_ocr2, gpt-sw3, gpt2, gpt_bigcode, gpt_neo, gpt_neox, gpt_neox_japanese, gptj, gptsan-japanese, granite, granitemoe, granitemoeshared, granitevision, graphormer, grounding-dino, groupvit, helium, hiera, hubert, ibert, idefics, idefics2, idefics3, idefics3_vision, ijepa, imagegpt, informer, instructblip, instructblipvideo, jamba, jetmoe, jukebox, kosmos-2, layoutlm, layoutlmv2, layoutlmv3, led, levit, lilt, llama, llama4, llama4_text, llava, llava_next, llava_next_video, llava_onevision, longformer, longt5, luke, lxmert, m2m_100, mamba, mamba2, marian, markuplm, mask2former, maskformer, maskformer-swin, mbart, mctct, mega, megatron-bert, mgp-str, mimi, mistral, mistral3, mixtral, mllama, mobilebert, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, modernbert, moonshine, moshi, mpnet, mpt, mra, mt5, musicgen, musicgen_melody, mvp, nat, nemotron, nezha, nllb-moe, nougat, nystromformer, olmo, olmo2, olmoe, omdet-turbo, oneformer, open-llama, openai-gpt, opt, owlv2, owlvit, paligemma, patchtsmixer, patchtst, pegasus, pegasus_x, perceiver, persimmon, phi, phi3, phi4_multimodal, phimoe, pix2struct, pixtral, plbart, poolformer, pop2piano, prompt_depth_anything, prophetnet, pvt, pvt_v2, qdqbert, qwen2, qwen2_5_vl, qwen2_audio, qwen2_audio_encoder, qwen2_moe, qwen2_vl, qwen3, qwen3_moe, rag, realm, recurrent_gemma, reformer, regnet, rembert, resnet, retribert, roberta, roberta-prelayernorm, roc_bert, roformer, rt_detr, rt_detr_resnet, rt_detr_v2, rwkv, sam, sam_vision_model, seamless_m4t, seamless_m4t_v2, segformer, seggpt, sew, sew-d, shieldgemma2, siglip, siglip2, siglip_vision_model, smolvlm, smolvlm_vision, speech-encoder-decoder, speech_to_text, speech_to_text_2, speecht5, splinter, squeezebert, stablelm, starcoder2, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, switch_transformers, t5, table-transformer, tapas, textnet, time_series_transformer, timesformer, timm_backbone, timm_wrapper, trajectory_transformer, transfo-xl, trocr, tvlt, tvp, udop, umt5, unispeech, unispeech-sat, univnet, upernet, van, video_llava, videomae, vilt, vipllava, vision-encoder-decoder, vision-text-dual-encoder, visual_bert, vit, vit_hybrid, vit_mae, vit_msn, vitdet, vitmatte, vitpose, vitpose_backbone, vits, vivit, wav2vec2, wav2vec2-bert, wav2vec2-conformer, wavlm, whisper, xclip, xglm, xlm, xlm-prophetnet, xlm-roberta, xlm-roberta-xl, xlnet, xmod, yolos, yoso, zamba, zamba2, zoedepth

@SunMarc
Copy link
Member

SunMarc commented May 6, 2025

This PR was never finished so the integration you are doing relies on remote code trust_remote_code=True

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants