-
Notifications
You must be signed in to change notification settings - Fork 246
Open
Description
Currently the SantaCoder and StarCoder FIM tasks have fixed FIM tokens; however, other models may use different fim-tokens. For example to use Qwen2.5-Coder models one might have to create
class SantaCoderQwen25CoderFIM(SantaCoderFIM):
DATASET_PATH = "bigcode/santacoder-fim-task"
def __init__(self):
fim_prefix = "<|fim_prefix|>"
fim_middle = "<|fim_middle|>"
fim_suffix = "<|fim_suffix|>"
stop_words = ["<|endoftext|>", "<|filename|>"]
super().__init__(
stop_words=stop_words,
requires_execution=False,
fim_prefix=fim_prefix,
fim_middle=fim_middle,
fim_suffix=fim_suffix,
)
Allowing the FIM parameters (e.g. --fim_tokens
and --stop_words
) to be passed in similarly to the --instruction_tokens
for HumanEval would allow this task to be a single class and support future FIM models.
Metadata
Metadata
Assignees
Labels
No labels