Skip to content

[ascend]Optimize moe #203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 10, 2025
Merged

Conversation

yao-fengchen
Copy link
Contributor

No description provided.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the CI configuration to use a new branch for LMDEPLOY, aligning with the "[ascend]Optimize moe" effort.

  • Updated LMDEPLOY_COMMIT_OR_BRANCH from 'main' to 'optimize_moe' in the GitHub workflow.

@yao-fengchen yao-fengchen force-pushed the optimize_moe branch 2 times, most recently from a61c7bb to d738195 Compare April 2, 2025 03:04
@yao-fengchen yao-fengchen force-pushed the optimize_moe branch 3 times, most recently from d94378d to df127d6 Compare April 2, 2025 04:21
@yao-fengchen yao-fengchen force-pushed the optimize_moe branch 2 times, most recently from 8acb00d to a4a5297 Compare April 2, 2025 04:29
@jinminxi104 jinminxi104 requested a review from Copilot April 3, 2025 03:05
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces optimizations for the moe module by updating function signatures and configurations. Key changes include:

  • Adding new parameters (head_size and head_size_v) to multiple attention-related functions.
  • Removing the slicing of past key values in the DeepseekV2Attention_forward function.
  • Updating the CI workflow to reference the "optimize_moe" branch and a new Git repository URL.

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.

File Description
dlinfer/ops/llm.py Updated function signatures and documentation for attention operations.
dlinfer/framework/lmdeploy_ext/dynamo/graph_mode_patch.py Modified parameter usage in the DeepseekV2Attention_forward function.
.github/workflows/main.yml Adjusted CI environment variables and updated the Git repository URL.
Comments suppressed due to low confidence (3)

dlinfer/ops/llm.py:132

  • [nitpick] Consider rephrasing the docstring to 'head_size (int): The size of each query head' for clarity.
head_size (int): The number of query head size.

dlinfer/framework/lmdeploy_ext/dynamo/graph_mode_patch.py:89

  • Removing the slice '[:nope_size]' might change the intended behavior. Please verify that passing the full tensor is correct.
past_key_value[0][..., :nope_size]

.github/workflows/main.yml:77

  • Ensure that the repository URL update from 'InternLM' to 'DeepLink-org' aligns with all relevant CI/CD configurations.
git clone https://github.com/DeepLink-org/lmdeploy.git ${{ env.LMDEPLOY_PATH }}

This reverts commit a4a5297.
@jinminxi104 jinminxi104 merged commit bec381e into DeepLink-org:main Apr 10, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants