Skip to content

candle-onnx: Implement layer normalization operator #2919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

BrunoSienkiewicz
Copy link
Contributor

@BrunoSienkiewicz BrunoSienkiewicz commented Apr 24, 2025

Added Layer Normalization operator with tests.
Related issue: #2849

@A2va
Copy link
Contributor

A2va commented Apr 24, 2025

Why not using LayerNorm from candle-nn ?

Or that is a different thing ? (I'm not that familiar with ml things)

@BrunoSienkiewicz
Copy link
Contributor Author

Why not using LayerNorm from candle-nn ?

Or that is a different thing ? (I'm not that familiar with ml things)

Thank you for your comment. Honestly I didn't see the implementation in candle-nn, maybe it can be used in this case. However I see difference in ONNX version of LayerNorm, that is additional axis parameter. I will mark this PR as draft until i clear this out.

@BrunoSienkiewicz BrunoSienkiewicz marked this pull request as draft April 24, 2025 20:41
@BrunoSienkiewicz
Copy link
Contributor Author

I have changed the implementation to use built-in candle-nn layer normalization. All tests are passing so I think everything should be alright with this approach.

@BrunoSienkiewicz BrunoSienkiewicz marked this pull request as ready for review April 26, 2025 19:07

let x_mat = xs.reshape((row_number, col_number))?;

let y_mat = candle_nn::ops::layer_norm_slow(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use layer-norm slow here rather than the optimized variants?

.to_dtype(DType::F32)?;

let expected = Tensor::new(expected, &Device::Cpu)?.to_dtype(DType::F32)?;
match expected.dims().len() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why split these cases and not compare the tensors directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the start i thought it would be a good idea to test out different dimensionality of tensors, but they in the end they are cast to 2D so it would not matter. I changed the test case to compare tensors directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants