Skip to content

add inductor xpu #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

add inductor xpu #16

wants to merge 8 commits into from

Conversation

RUIJIEZHONG66166
Copy link
Collaborator

No description provided.

else
wget -q -e "https_proxy=http://proxy-us.intel.com:912" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
source ${HOME}/miniconda3/etc/profile.d/conda.sh 2>&1 >> /dev/null
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line can be replaced by source ${HOME}/miniconda3/bin/activate

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can apply it in other places

fi
else
wget -q -e "https_proxy=http://proxy-us.intel.com:912" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -b
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have met the miniconda package issue like conda/conda#13225 recently? Seems the package https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh (same as Miniconda3-py311_23.9.0-0-Linux-x86_64.sh) has issue now. suggest to use other python versions, eg. Miniconda3-py39_23.9.0-0-Linux-x86_64.sh

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never met this issue. We usually install the Miniconda3-latest-Linux-x86_64.sh on our machine firstly

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, please aware it, when we meet such issue can use similar workaround

fi

# set gpu governor
if [[ -z "${USER_PASS}" ]];then USER_PASS="gta";fi
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't hardcode passwd in this public repo, if we really need such info, let's pass them by Jenkins credential

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set it as jenkins parameter

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the cpu mode setting moved, do we still need this one?

TRITON_CODEGEN_INTEL_XPU_BACKEND=1 python setup.py bdist_wheel
pip install dist/*.whl
source ${HOME}/env.sh
python -c "import triton"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe need move to parent dir to do the import check to avoid some path issue, because under there is a triton folder python

wget -q -e use_proxy=no ${ipex_whl}
python -m pip install --force-reinstall $(basename ${ipex_whl})
else
bash ${WORKSPACE}/inductor-tools/scripts/env_prepare.sh
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any parameter in the beginning of this env_prepare.sh script, please add them. By the way, please also add oneapi version as a parameter for this script and env.sh to add the basekit version control. You can refer latest ci/nightly workflows in xpu backend repo

@@ -0,0 +1,57 @@
installed_torch_git_version=$(python -c "import torch;print(torch.version.git_version)"|| true)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add parameters in here

Copy link
Owner

@chuanqi129 chuanqi129 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second round comments, those comments aim to improve the jobs flexibility.

'''
}//retry
}//stage
stage('Accuracy-Test') {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a multi-choice parameter SCENARIO in this groovy and jenkins job, the value can be accuracy, performance, if the accuracy in the SCENARIO do the accuracy test, else bypass this stage. The performance stage similar with accuracy part. Default value can be choose accuracy & performance

pip install styleFrame scipy pandas
pushd ${WORKSPACE}/pytorch
rm -rf inductor_log
bash inductor_xpu_test.sh huggingface amp_bf16 inference accuracy xpu 0 & \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also keep SUITE, DT, MODE as this groovy and jenkins job's parameters, all those parameters use multi-choice parameter, and combine those parameter values, do test one-by-one. Default value can be set as current hardcode cmd. For example, if the SUITE={huggingface,timm_models}, DT={float32,amp_bf16}, MODE={inference,training}, then it will be have 8 combinations need to test.

conda activate ${conda_env}
source ${HOME}/env.sh ${oneapi_ver}

cd ${WORKSPACE}/pytorch/inductor_log/huggingface
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract this log summary part as a standalone scripts, and keep the flexibility for different suite/dt/mode/scenario(low priority). A simple way is that log summary scripts receive those parameters and maintain the criteria for different tests (by using a configure file or dict like structure). E.g. {"huggingface-amp_bf16-inference-accuracy":44, "huggingface-amp_bf16-training-accuracy":42}

}//retry
}//stage
stage('Performance-Test') {
println('================================================================')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar request with accuracy test stage

bash inductor_xpu_test.sh huggingface amp_bf16 inference accuracy xpu 0 & \
bash inductor_xpu_test.sh huggingface amp_bf16 training accuracy xpu 1 & \
bash inductor_xpu_test.sh huggingface amp_fp16 inference accuracy xpu 2 & \
bash inductor_xpu_test.sh huggingface amp_fp16 training accuracy xpu 3 & wait
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The right cmd is:

          bash inductor_xpu_test.sh ${SUITE} ${DT} ${MODE} accuracy xpu 0 static 4 0 & \
          bash inductor_xpu_test.sh ${SUITE} ${DT} ${MODE} accuracy xpu 1 static 4 1 & \
          bash inductor_xpu_test.sh ${SUITE} ${DT} ${MODE} accuracy xpu 2 static 4 2 & \
          bash inductor_xpu_test.sh ${SUITE} ${DT} ${MODE} accuracy xpu 3 static 4 3 & wait

It means the one combination will be split as 4 sub-test run on 4 card at the same time. Please modify the cmd in performance part also. Your previously cmd run 2 combination at same time and each combination run 2 times on 2 card, not split them as sub-test.

@chuanqi129
Copy link
Owner

Hi @RUIJIEZHONG66166 any update for those comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants