-
Notifications
You must be signed in to change notification settings - Fork 1
add inductor xpu #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add inductor xpu #16
Conversation
else | ||
wget -q -e "https_proxy=http://proxy-us.intel.com:912" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh | ||
bash Miniconda3-latest-Linux-x86_64.sh -b | ||
source ${HOME}/miniconda3/etc/profile.d/conda.sh 2>&1 >> /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line can be replaced by source ${HOME}/miniconda3/bin/activate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can apply it in other places
fi | ||
else | ||
wget -q -e "https_proxy=http://proxy-us.intel.com:912" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh | ||
bash Miniconda3-latest-Linux-x86_64.sh -b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have met the miniconda package issue like conda/conda#13225 recently? Seems the package https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh (same as Miniconda3-py311_23.9.0-0-Linux-x86_64.sh) has issue now. suggest to use other python versions, eg. Miniconda3-py39_23.9.0-0-Linux-x86_64.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never met this issue. We usually install the Miniconda3-latest-Linux-x86_64.sh on our machine firstly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, please aware it, when we meet such issue can use similar workaround
fi | ||
|
||
# set gpu governor | ||
if [[ -z "${USER_PASS}" ]];then USER_PASS="gta";fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't hardcode passwd in this public repo, if we really need such info, let's pass them by Jenkins credential
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set it as jenkins parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the cpu mode setting moved, do we still need this one?
TRITON_CODEGEN_INTEL_XPU_BACKEND=1 python setup.py bdist_wheel | ||
pip install dist/*.whl | ||
source ${HOME}/env.sh | ||
python -c "import triton" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe need move to parent dir to do the import check to avoid some path issue, because under there is a triton
folder python
wget -q -e use_proxy=no ${ipex_whl} | ||
python -m pip install --force-reinstall $(basename ${ipex_whl}) | ||
else | ||
bash ${WORKSPACE}/inductor-tools/scripts/env_prepare.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any parameter in the beginning of this env_prepare.sh
script, please add them. By the way, please also add oneapi version as a parameter for this script and env.sh
to add the basekit version control. You can refer latest ci/nightly workflows in xpu backend repo
@@ -0,0 +1,57 @@ | |||
installed_torch_git_version=$(python -c "import torch;print(torch.version.git_version)"|| true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add parameters in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second round comments, those comments aim to improve the jobs flexibility.
''' | ||
}//retry | ||
}//stage | ||
stage('Accuracy-Test') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a multi-choice parameter SCENARIO
in this groovy and jenkins job, the value can be accuracy
, performance
, if the accuracy
in the SCENARIO
do the accuracy test, else bypass this stage. The performance
stage similar with accuracy
part. Default value can be choose accuracy
& performance
pip install styleFrame scipy pandas | ||
pushd ${WORKSPACE}/pytorch | ||
rm -rf inductor_log | ||
bash inductor_xpu_test.sh huggingface amp_bf16 inference accuracy xpu 0 & \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also keep SUITE
, DT
, MODE
as this groovy and jenkins job's parameters, all those parameters use multi-choice parameter, and combine those parameter values, do test one-by-one. Default value can be set as current hardcode cmd. For example, if the SUITE={huggingface,timm_models}, DT={float32,amp_bf16}, MODE={inference,training}
, then it will be have 8 combinations need to test.
conda activate ${conda_env} | ||
source ${HOME}/env.sh ${oneapi_ver} | ||
|
||
cd ${WORKSPACE}/pytorch/inductor_log/huggingface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract this log summary part as a standalone scripts, and keep the flexibility for different suite/dt/mode/scenario(low priority). A simple way is that log summary scripts receive those parameters and maintain the criteria for different tests (by using a configure file or dict like structure). E.g. {"huggingface-amp_bf16-inference-accuracy":44, "huggingface-amp_bf16-training-accuracy":42}
}//retry | ||
}//stage | ||
stage('Performance-Test') { | ||
println('================================================================') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar request with accuracy test stage
bash inductor_xpu_test.sh huggingface amp_bf16 inference accuracy xpu 0 & \ | ||
bash inductor_xpu_test.sh huggingface amp_bf16 training accuracy xpu 1 & \ | ||
bash inductor_xpu_test.sh huggingface amp_fp16 inference accuracy xpu 2 & \ | ||
bash inductor_xpu_test.sh huggingface amp_fp16 training accuracy xpu 3 & wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The right cmd is:
bash inductor_xpu_test.sh ${SUITE} ${DT} ${MODE} accuracy xpu 0 static 4 0 & \
bash inductor_xpu_test.sh ${SUITE} ${DT} ${MODE} accuracy xpu 1 static 4 1 & \
bash inductor_xpu_test.sh ${SUITE} ${DT} ${MODE} accuracy xpu 2 static 4 2 & \
bash inductor_xpu_test.sh ${SUITE} ${DT} ${MODE} accuracy xpu 3 static 4 3 & wait
It means the one combination will be split as 4 sub-test run on 4 card at the same time. Please modify the cmd in performance part also. Your previously cmd run 2 combination at same time and each combination run 2 times on 2 card, not split them as sub-test.
Hi @RUIJIEZHONG66166 any update for those comments? |
No description provided.