Add initial support for Ada #162

rowan-walshe · 2024-11-20T17:48:46Z

Hi. @Fabien-Chouteau and I have worked on a patch to add support for translating prompts into Ada.

It's able to translate ~97% of HumanEval and ~90% of MBPP problems (I haven't included the generated prompts in the PR, but let me know if I should). The only types that we haven't yet tried to translate are Any and Union.

We've also included a small change that should add basic support for translating problems that use Sets (which I believe is only mbpp_473_tuple_intersection.py at this time).

As a sense check, we've performed an initial run against one model (Qwen2.5-Coder-7B-Instruct):

Pass@k

Dataset	Pass@k	Estimate	NumProblems	MinCompletions	MaxCompletions
humaneval-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	1	0.27662420382165603	157	200	200
humaneval-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	10	0.5136883316370555	157	200	200
humaneval-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	100	0.6907559477298758	157	200	200
humaneval-ada-remove-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	1	0.2647402597402597	154	200	200
humaneval-ada-remove-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	10	0.5031330123876929	154	200	200
humaneval-ada-remove-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	100	0.6783336009998593	154	200	200
humaneval-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	1	0.28019230769230763	156	200	200
humaneval-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	10	0.5130791721197416	156	200	200
humaneval-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	100	0.6870015325140781	156	200	200
humaneval-ada-transform-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	1	0.2791666666666667	156	200	200
humaneval-ada-transform-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	10	0.5155015133300426	156	200	200
humaneval-ada-transform-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	100	0.7046128551679812	156	200	200
mbpp-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	1	0.35695290858725764	361	200	200
mbpp-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	10	0.5976034257539806	361	200	200
mbpp-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	100	0.7402170888358616	361	200	200
mbpp-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	1	0.35779778393351797	361	200	200
mbpp-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	10	0.5974924109981751	361	200	200
mbpp-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded	100	0.7409462833345465	361	200	200

Please let me know if you have any feedback, as we'd love to see support for Ada included in the project. Thanks :)

arjunguha · 2024-11-22T17:32:44Z

Thanks for this! Just so we're on the same page:

You're adding support for sets, which we had omitted since they didn't appear in HumanEval. But, I see that there are two MBPP problems that require them:

arjun@arjun-laptop datasets % pwd
/Users/arjun/repos/nuprl/MultiPL-E/datasets
arjun@arjun-laptop datasets % grep -F "Set[" */*.py
mbpp-typed/mbpp_473_tuple_intersection.py:def tuple_intersection(test_list1: List[Tuple[int, int]], test_list2: List[Tuple[int, int]]) -> Set[Tuple[int, int]]:
mbpp-typed/mbpp_582_my_dict.py:def my_dict(dict1: Set[int]) -> bool:

All the other translators should be updated to support sets if they want to support these two problems.

rowan-walshe · 2024-11-22T23:31:35Z

Yes, other translators would need to be updated if they want to support these two problems. While I have defined a get_set method for translators that extend the LanguageTranslator class, I didn't complete any of the implementations. If you want, I can have a go at doing this, though I have little to no experience with many of the languages supported by MultiPL-E.

Just to confirm though, the Set related changes do not negatively impact the translation ratios for any of the existing translators. The only difference is the exception that is raised for those two problems.

Previously the exceptions raised looked like:

...
 File ".../projects/ai/MultiPL-E/dataset_builder/generic_translator.py", line 44, in translate_expr
 raise Exception(f"Unhandled expression: {py_expr}")
Exception: Unhandled expression: <ast.Set object at 0x1054e5750>

Now the exceptions will look like:

...
 File ".../MultiPL-E/dataset_builder/generic_translator.py", line 31, in translate_expr
 return translator.gen_set([translate_expr(translator, e) for e in elts])
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File ".../MultiPL-E/dataset_builder/humaneval_to_py.py", line 103, in gen_set
    raise NotImplementedError("This translator does not currently support translating sets")
NotImplementedError: This translator does not currently support translating sets

or

...
 File ".../MultiPL-E/dataset_builder/generic_translator.py", line 31, in translate_expr
 return translator.gen_set([translate_expr(translator, e) for e in elts])
 ^^^^^^^^^^^^^^^^^^
AttributeError: 'Translator' object has no attribute 'gen_set'. Did you mean: 'gen_dict'?

Note also that while MBPP 473 is a well-formed problem, MBPP 582 needs a couple of minor changes. Its current signature is:

def my_dict(dict1: Set[int]) -> bool:
    """
    Write a function to check if a dictionary is empty
    """
    ...

So the function name and docstring suggest that it takes a dictionary, but it's currently typed to take a set. Then two test cases pass a set, while the third is a dictionary. I'm not entirely sure which combination of typehint, testcase, and docstring changes should be made, but I am confident that it probably should be updated.

Add pass@1 metric to pass_k.py Update pass_k.py to load the results file from .gz or .json Added basic support for Sets, enabling the translation of mbpp_473_tuple_intersection.py Co-authored-by: Rowan Walshe <[email protected]> Co-authored-by: Fabien Chouteau <[email protected]>

arjunguha · 2024-11-23T14:35:10Z

Thanks! Yeah, some of these problems are a mess. We have generally erred on the side of letting problems be faulty, if the fault was in the original Python problem, but fixing problems in the translators.

See EvalPlus (HumanEvalPlus) for a project that actually fixes the faults in the original Python problems.

Anyway, I'll have this merged within this week.

The container for execution is getting very large. I may create a new container for Ada (and do so for other PLs going forward).

arjunguha · 2024-12-30T21:26:20Z

I tuned this out over the break. But, I'll get to this this week. Thanks for your patience.

arjunguha · 2025-01-06T14:59:37Z

I've merged this in and updated the dataset README on the Hub:

https://huggingface.co/datasets/nuprl/MultiPL-E

I built a separate evaluation container for Ada, which is also pushed to the GitHub Container Registry:

3d11fa3

I do need to document that this container is here. I'll get to that next, and I do want to update the rather unwieldly directions for supporting a new language.

arjunguha · 2025-01-06T15:01:38Z

Also, here is an Ada result on MultiPL-HumanEval with Llama 3.1 8b:

11.3% at temperature 0.2 (just 20 completions, so not totally stable, but very close to the true value)

rowan-walshe mentioned this pull request Nov 20, 2024

A few small fixes for the untranslated datasets #163

Merged

rowan-walshe force-pushed the topic/Add-initial-support-for-ada branch from 48a408e to 8a94612 Compare November 22, 2024 23:32

arjunguha approved these changes Jan 6, 2025

View reviewed changes

arjunguha merged commit fbee4e2 into nuprl:main Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add initial support for Ada #162

Add initial support for Ada #162

Uh oh!

rowan-walshe commented Nov 20, 2024

Uh oh!

arjunguha commented Nov 22, 2024

Uh oh!

rowan-walshe commented Nov 22, 2024

Uh oh!

arjunguha commented Nov 23, 2024

Uh oh!

arjunguha commented Dec 30, 2024

Uh oh!

arjunguha commented Jan 6, 2025

Uh oh!

arjunguha commented Jan 6, 2025

Uh oh!

Uh oh!

Add initial support for Ada #162

Add initial support for Ada #162

Uh oh!

Conversation

rowan-walshe commented Nov 20, 2024

Uh oh!

arjunguha commented Nov 22, 2024

Uh oh!

rowan-walshe commented Nov 22, 2024

Uh oh!

arjunguha commented Nov 23, 2024

Uh oh!

arjunguha commented Dec 30, 2024

Uh oh!

arjunguha commented Jan 6, 2025

Uh oh!

arjunguha commented Jan 6, 2025

Uh oh!

Uh oh!