Skip to content

ci: Add generic windows_job #887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 18, 2022
46 changes: 46 additions & 0 deletions .github/actions/setup-windows/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Setup Windows

description: Set up for windows jobs

inputs:
cuda-version:
description: which cuda version to install, 'cpu' for none
required: true

runs:
using: composite
steps:
- name: Display EC2 information
shell: bash
run: |
set -euo pipefail
function get_ec2_metadata() {
# Pulled from instance metadata endpoint for EC2
# see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
category=$1
curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"
}
echo "ami-id: $(get_ec2_metadata ami-id)"
echo "instance-id: $(get_ec2_metadata instance-id)"
echo "instance-type: $(get_ec2_metadata instance-type)"
echo "system info $(uname -a)"

# Needed for binary builds, see: https://github.com/pytorch/pytorch/issues/73339#issuecomment-1058981560
- name: Enable long paths on Windows
shell: powershell
run: |
Set-ItemProperty -Path "HKLM:\\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1

# Since it's just a defensive command, the workflow should continue even the command fails
- name: Disables Windows Defender scheduled and real-time scanning for files in pytorch directory.
shell: powershell
run: |
Add-MpPreference -ExclusionPath $(Get-Location).tostring() -ErrorAction Ignore

- name: Setup useful environment variables
shell: bash
working-directory: ${{ inputs.repository }}
run: |
RUNNER_ARTIFACT_DIR="$(cygpath ${RUNNER_TEMP})/artifacts"
mkdir -p "${RUNNER_ARTIFACT_DIR}"
echo "RUNNER_ARTIFACT_DIR=$(cygpath ${RUNNER_TEMP})/artifacts" >> "${GITHUB_ENV}"
50 changes: 50 additions & 0 deletions .github/actions/teardown-windows/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: Teardown Windows

description: Set up Docker workspace on linux

inputs:
extra-delete-dir:
description: If set, cleaning up the workspace will delete this too
required: false
default: ""

runs:
using: composite
steps:
- name: Wait until all sessions have drained
shell: powershell
if: always()
run: |
function Get-SSH-Users {
# Gets ssh sessions for all users not named SYSTEM
Get-CimInstance -ClassName Win32_Process -Filter "Name = 'sshd.exe'" |
Get-CimAssociatedInstance -Association Win32_SessionProcess |
Get-CimAssociatedInstance -Association Win32_LoggedOnUser |
Where-Object {$_.Name -ne 'SYSTEM'} |
Measure-Object
}

$usersLoggedOn = Get-SSH-Users

Write-Output "Holding runner until all ssh sessions have logged out"
while ($usersLoggedOn.Count -gt 0) {
$usersLoggedOn = Get-SSH-Users
Write-Output "."
Start-Sleep -s 5
}

- name: Kill active ssh sessions if still around (Useful if workflow was cancelled)
shell: powershell
if: always()
run: |
function Get-SSH-Sessions {
Get-Process sshd -IncludeUserName |
Where-Object UserName -notLike "*SYSTEM*" |
Select-Object Id
}

$runningSessions = Get-SSH-Sessions

foreach ($session in $runningSessions) {
Stop-Process -id $session.Id
}
54 changes: 54 additions & 0 deletions .github/workflows/test_windows_job.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
name: Test build/test windows workflow

on:
pull_request:
paths:
- .github/workflows/windows_job.yml
- .github/workflows/test_windows_job.yml
workflow_dispatch:

jobs:
test-cpu:
uses: ./.github/workflows/windows_job.yml
with:
runner: windows.4xlarge
test-infra-repository: ${{ github.repository }}
test-infra-ref: ${{ github.ref }}
script: |
conda create -y -n test python=3.8
conda activate test
python -m pip install --extra-index-url https://download.pytorch.org/whl/nightly/cpu --pre torch
# Can import pytorch
python -c 'import torch'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same think here, please include conda env remove

test-gpu:
uses: ./.github/workflows/windows_job.yml
with:
runner: windows.8xlarge.nvidia.gpu
test-infra-repository: ${{ github.repository }}
test-infra-ref: ${{ github.ref }}
timeout: 60
script: |
conda create -y -n test python=3.8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also include delete of conda environment at the end of the test ?

conda env remove -p test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are ephemeral I'm not really too worried about it

conda activate test
python -m pip install --extra-index-url https://download.pytorch.org/whl/nightly/cu116 --pre torch
# Can import pytorch, cuda is available
python -c 'import torch;assert(torch.cuda.is_available())'
test-upload-artifact:
uses: ./.github/workflows/windows_job.yml
with:
runner: windows.4xlarge
test-infra-repository: ${{ github.repository }}
test-infra-ref: ${{ github.ref }}
upload-artifact: my-cool-artifact
script: |
echo "hello" > "${RUNNER_ARTIFACT_DIR}/cool_beans"
test-download-artifact:
needs: test-upload-artifact
uses: ./.github/workflows/windows_job.yml
with:
runner: windows.4xlarge
test-infra-repository: ${{ github.repository }}
test-infra-ref: ${{ github.ref }}
download-artifact: my-cool-artifact
script: |
grep "hello" "${RUNNER_ARTIFACT_DIR}/cool_beans"
122 changes: 122 additions & 0 deletions .github/workflows/windows_job.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
name: Run a Windows job

on:
workflow_call:
inputs:
script:
description: 'Script to utilize'
default: "python setup.py bdist_wheel"
type: string
timeout:
description: 'Timeout for the job (in minutes)'
default: 30
type: number
runner:
description: 'Runner type to utilize'
default: "windows.4xlarge"
type: string
upload-artifact:
description: 'Name to give artifacts uploaded from ${RUNNER_ARTIFACT_DIR}'
default: ''
type: string
download-artifact:
description: 'Name to download artifacts to ${RUNNER_ARTIFACT_DIR}'
default: ''
type: string
repository:
description: 'Repository to checkout, defaults to ""'
default: ""
type: string
ref:
description: 'Reference to checkout, defaults to "nightly"'
default: ""
type: string
test-infra-repository:
description: "Test infra repository to use"
default: "pytorch/test-infra"
type: string
test-infra-ref:
description: "Test infra reference to use"
default: ""
type: string

jobs:
job:
env:
REPOSITORY: ${{ inputs.repository || github.repository }}
SCRIPT: ${{ inputs.script }}
runs-on: ${{ inputs.runner }}
timeout-minutes: ${{ inputs.timeout }}
steps:
- name: Checkout repository (${{ inputs.test-infra-repository }}@${{ inputs.test-infra-ref }})
uses: actions/checkout@v3
with:
# Support the use case where we need to checkout someone's fork
repository: ${{ inputs.test-infra-repository }}
ref: ${{ inputs.test-infra-ref }}
path: test-infra

- name: Setup Windows
uses: ./test-infra/.github/actions/setup-windows

- name: Setup SSH
uses: ./test-infra/.github/actions/setup-ssh

- name: Checkout repository (${{ inputs.repository || github.repository }}@${{ inputs.ref }})
uses: actions/checkout@v3
with:
# Support the use case where we need to checkout someone's fork
repository: ${{ inputs.repository || github.repository }}
ref: ${{ inputs.ref || github.ref }}
path: ${{ inputs.repository || github.repository }}

- name: Download artifacts (if any)
uses: actions/download-artifact@v3
if: ${{ inputs.download-artifact != '' }}
with:
name: ${{ inputs.download-artifact }}
path: ${{ runner.temp }}/artifacts/

- name: Run script
shell: bash -l {0}
working-directory: ${{ inputs.repository }}
run: |
{
echo "#!/usr/bin/env bash";
echo "set -eou pipefail";
# Without this specific version of pywin32 conda the default conda installation does not work
# See https://github.com/conda/conda/issues/11503
echo "/c/Jenkins/Miniconda3/python.exe -m pip install --upgrade pywin32==304"
# Source conda so it's available to the script environment
echo "source /c/Jenkins/Miniconda3/etc/profile.d/conda.sh";
echo "${SCRIPT}";
} > "${RUNNER_TEMP}/exec_script"
bash "${RUNNER_TEMP}/exec_script"

- name: Check if there are potential artifacts and move them to the correct artifact location
shell: bash -l {0}
working-directory: ${{ inputs.repository }}
id: check-artifacts
if: ${{ inputs.upload-artifact != '' }}
env:
UPLOAD_ARTIFACT_NAME: ${{ inputs.upload-artifact }}
run: |
# If the default execution path is followed then we should get a wheel in the dist/ folder
# attempt to just grab whatever is in there and scoop it all up
if find "dist/" -name "*.whl" >/dev/null 2>/dev/null; then
mv -v dist/*.whl "${RUNNER_ARTIFACT_DIR}/"
fi
# Set to fail upload step if there are no files for upload and expected files for upload
echo '::set-output name=if-no-files-found::error'

- name: Upload artifacts to GitHub (if any)
uses: actions/upload-artifact@v3
if: ${{ inputs.upload-artifact != '' }}
with:
name: ${{ inputs.upload-artifact }}
path: ${{ runner.temp }}/artifacts/
if-no-files-found: ${{ steps.check-artifacts.outputs.if-no-files-found }}

- name: Teardown Windows
if: ${{ always() }}
uses: ./test-infra/.github/actions/teardown-windows