-
Notifications
You must be signed in to change notification settings - Fork 1.1k
candle-onnx: Implement layer normalization operator #2919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Why not using LayerNorm from candle-nn ? Or that is a different thing ? (I'm not that familiar with ml things) |
Thank you for your comment. Honestly I didn't see the implementation in candle-nn, maybe it can be used in this case. However I see difference in ONNX version of LayerNorm, that is additional |
I have changed the implementation to use built-in candle-nn layer normalization. All tests are passing so I think everything should be alright with this approach. |
candle-onnx/src/eval.rs
Outdated
|
||
let x_mat = xs.reshape((row_number, col_number))?; | ||
|
||
let y_mat = candle_nn::ops::layer_norm_slow( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use layer-norm slow here rather than the optimized variants?
candle-onnx/tests/ops.rs
Outdated
.to_dtype(DType::F32)?; | ||
|
||
let expected = Tensor::new(expected, &Device::Cpu)?.to_dtype(DType::F32)?; | ||
match expected.dims().len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why split these cases and not compare the tensors directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the start i thought it would be a good idea to test out different dimensionality of tensors, but they in the end they are cast to 2D so it would not matter. I changed the test case to compare tensors directly.
Added Layer Normalization operator with tests.
Related issue: #2849