Fix last_hidden_state slicing position in ModernBertForSequenceClassification #3010

whitebox2 · 2025-06-28T10:12:03Z

Problem

The shape of the tensor passed to the classifier is different.
Transformers: correctly uses the [CLS] token → shape (batch_size, hidden_size)
slices across the feature dimension → shape (batch_size, seq_len)

Suggestion

I would like to propose the following change.
before

let last_hidden_state = output.i((.., .., 0))?;

after

let last_hidden_state = output.i((.., 0, ..))?;

This selects the first token (i.e., [CLS]) across all samples , resulting in a shape of (batch_size, hidden_size)

issue

#3000

Update modernbert.rs

2dc1b2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix last_hidden_state slicing position in ModernBertForSequenceClassification #3010

Fix last_hidden_state slicing position in ModernBertForSequenceClassification #3010

Uh oh!

whitebox2 commented Jun 28, 2025

Uh oh!

Uh oh!

Fix last_hidden_state slicing position in ModernBertForSequenceClassification #3010

Are you sure you want to change the base?

Fix last_hidden_state slicing position in ModernBertForSequenceClassification #3010

Uh oh!

Conversation

whitebox2 commented Jun 28, 2025

Problem

Suggestion

issue

Uh oh!

Uh oh!