Skip to content

[29] Logging in json format #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 17, 2025
Merged

[29] Logging in json format #68

merged 7 commits into from
Mar 17, 2025

Conversation

tjhunter
Copy link
Collaborator

@tjhunter tjhunter commented Mar 12, 2025

Closes #29

First part: prototyping the new format.


# TODO: performance: we repeatedly open the file for each call. Better for multiprocessing
# but we can probably do better and rely for example on the logging module.
with open(os.path.join(self.path_run, "metrics.json"), "ab") as f:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we start with this simple version, we can always improve performance if it turns out to be a bottleneck

@tjhunter tjhunter marked this pull request as ready for review March 13, 2025 16:18
pyproject.toml Outdated

[tool.uv.sources]
flash-attn = { url = "https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp312-cp312-linux_x86_64.whl" }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to make this change to use uv on the hpc2020 cluster. I am not sure if this is going to be a breaking change for people. @clessig , do we assume that different HPCs can use different versions of CUDA? That sounds like a nightmare.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not assume it, we know it ;) One can write a script that detects the available CUDA (and the python version if this is a variable) and then assembles the string that defines the wheel to be downloaded. @tjhunter : To what extent could one integrate this into pyproject toml?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And could we open an issues to track this? :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the script in branch of the private repo but not committed yet:
#57

@tjhunter tjhunter merged commit 1dece82 into develop Mar 17, 2025
3 checks passed
@tjhunter
Copy link
Collaborator Author

Ass discussed, will be followed up by #90

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Refactor TrainLogger
3 participants