You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to use the TokenClassificationExplainer for my fine-tuned BERT model. It returns a dictionary where the key is the tokenized inputs.
When I process the returned dict manually, there was a missing token. Turned out, it has appeared once in the dict. A dictionary cannot have a duplicate key. Therefore, it did not show up in the final returned value. For those who use this class, I recommend to modify the return value so that all the tokenized inputs are preserved.
Here is the original implementation:
@property
def word_attributions(self) -> Dict:
"Returns the word attributions for model and the text provided. Raises error if attributions not calculated."
if self.attributions is not None:
word_attr = dict()
tokens = [token.replace("Ġ", "") for token in self.decode(self.input_ids)]
labels = self.predicted_class_names
for index, attr in self.attributions.items():
try:
predicted_class = self.id2label[torch.argmax(self.pred_probs[index]).item()]
except KeyError:
predicted_class = torch.argmax(self.pred_probs[index]).item()
word_attr[tokens[index]] = {
"label": predicted_class,
"attribution_scores": attr.word_attributions,
}
return word_attr
else:
raise ValueError("Attributions have not yet been calculated. Please call the explainer on text first.")
Below are my modifications to the word_attributions property.
@property
def word_attributions(self) -> List:
"Returns the word attributions for model and the text provided. Raises error if attributions not calculated."
if self.attributions is not None:
word_attr = []
tokens = [token.replace("Ġ", "") for token in self.decode(self.input_ids)]
labels = self.predicted_class_names
for index in self._selected_indexes:
try:
predicted_class = self.id2label[torch.argmax(self.pred_probs[index]).item()]
except KeyError:
predicted_class = torch.argmax(self.pred_probs[index]).item()
word_attr.append({
"index": index,
"token": tokens[index],
"label": predicted_class,
"attribution_scores": self.attributions[index].word_attributions,
})
return word_attr
else:
raise ValueError("Attributions have not yet been calculated. Please call the explainer on text first.")
Notes:
I prefer using index in self._selected_indexes: as the iterator for consistency with other methods within the Class.
I have checked that the labels from self.predicted_class_names is consistent to the infered labels in the try...except statement. I think it is better to use the pre-infered labels for consistency. However, please do re-check and verify when you try it.
The text was updated successfully, but these errors were encountered:
I tried to use the
TokenClassificationExplainer
for my fine-tuned BERT model. It returns a dictionary where the key is the tokenized inputs.When I process the returned dict manually, there was a missing token. Turned out, it has appeared once in the dict. A dictionary cannot have a duplicate key. Therefore, it did not show up in the final returned value. For those who use this class, I recommend to modify the return value so that all the tokenized inputs are preserved.
Here is the original implementation:
Below are my modifications to the
word_attributions
property.Notes:
index
inself._selected_indexes:
as the iterator for consistency with other methods within the Class.labels
fromself.predicted_class_names
is consistent to the inferedlabels
in thetry...except
statement. I think it is better to use the pre-infered labels for consistency. However, please do re-check and verify when you try it.The text was updated successfully, but these errors were encountered: