-
Notifications
You must be signed in to change notification settings - Fork 10
[ascend]Optimize moe #203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ascend]Optimize moe #203
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the CI configuration to use a new branch for LMDEPLOY, aligning with the "[ascend]Optimize moe" effort.
- Updated LMDEPLOY_COMMIT_OR_BRANCH from 'main' to 'optimize_moe' in the GitHub workflow.
a61c7bb
to
d738195
Compare
d94378d
to
df127d6
Compare
8acb00d
to
a4a5297
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces optimizations for the moe module by updating function signatures and configurations. Key changes include:
- Adding new parameters (head_size and head_size_v) to multiple attention-related functions.
- Removing the slicing of past key values in the DeepseekV2Attention_forward function.
- Updating the CI workflow to reference the "optimize_moe" branch and a new Git repository URL.
Reviewed Changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.
File | Description |
---|---|
dlinfer/ops/llm.py | Updated function signatures and documentation for attention operations. |
dlinfer/framework/lmdeploy_ext/dynamo/graph_mode_patch.py | Modified parameter usage in the DeepseekV2Attention_forward function. |
.github/workflows/main.yml | Adjusted CI environment variables and updated the Git repository URL. |
Comments suppressed due to low confidence (3)
dlinfer/ops/llm.py:132
- [nitpick] Consider rephrasing the docstring to 'head_size (int): The size of each query head' for clarity.
head_size (int): The number of query head size.
dlinfer/framework/lmdeploy_ext/dynamo/graph_mode_patch.py:89
- Removing the slice '[:nope_size]' might change the intended behavior. Please verify that passing the full tensor is correct.
past_key_value[0][..., :nope_size]
.github/workflows/main.yml:77
- Ensure that the repository URL update from 'InternLM' to 'DeepLink-org' aligns with all relevant CI/CD configurations.
git clone https://github.com/DeepLink-org/lmdeploy.git ${{ env.LMDEPLOY_PATH }}
fee4ebe
to
6a169d8
Compare
This reverts commit a4a5297.
No description provided.