Skip to content

Commit 6d6e0f4

Browse files
authored
[trainer] new kto mismatch pair creation strategy (#7509)
1 parent 2d421c5 commit 6d6e0f4

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

src/llamafactory/data/processor/feedback.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,8 @@ def _encode_data_example(
8383
return input_ids, labels, kl_input_ids, kl_labels, kto_tag
8484

8585
def preprocess_dataset(self, examples: dict[str, list[Any]]) -> dict[str, list[Any]]:
86-
# create unrelated input-output pairs for estimating the KL term by flipping the matched pairs
87-
kl_response = examples["_response"][::-1]
86+
# Creates mismatched pairs of prompts and completions for the KL dataset by adding a +1 offset to the order of completions.
87+
kl_response = [examples["_response"][-1]] + examples["_response"][:-1]
8888
model_inputs = defaultdict(list)
8989
for i in range(len(examples["_prompt"])):
9090
if len(examples["_prompt"][i]) % 2 != 1 or len(examples["_response"][i]) < 2:

0 commit comments

Comments
 (0)