Skip to content

Add initial support for Ada #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

rowan-walshe
Copy link
Contributor

Hi. @Fabien-Chouteau and I have worked on a patch to add support for translating prompts into Ada.

It's able to translate ~97% of HumanEval and ~90% of MBPP problems (I haven't included the generated prompts in the PR, but let me know if I should). The only types that we haven't yet tried to translate are Any and Union.

We've also included a small change that should add basic support for translating problems that use Sets (which I believe is only mbpp_473_tuple_intersection.py at this time).

As a sense check, we've performed an initial run against one model (Qwen2.5-Coder-7B-Instruct):

Pass@k
Dataset Pass@k Estimate NumProblems MinCompletions MaxCompletions
humaneval-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 1 0.27662420382165603 157 200 200
humaneval-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 10 0.5136883316370555 157 200 200
humaneval-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 100 0.6907559477298758 157 200 200
humaneval-ada-remove-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 1 0.2647402597402597 154 200 200
humaneval-ada-remove-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 10 0.5031330123876929 154 200 200
humaneval-ada-remove-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 100 0.6783336009998593 154 200 200
humaneval-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 1 0.28019230769230763 156 200 200
humaneval-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 10 0.5130791721197416 156 200 200
humaneval-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 100 0.6870015325140781 156 200 200
humaneval-ada-transform-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 1 0.2791666666666667 156 200 200
humaneval-ada-transform-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 10 0.5155015133300426 156 200 200
humaneval-ada-transform-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 100 0.7046128551679812 156 200 200
mbpp-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 1 0.35695290858725764 361 200 200
mbpp-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 10 0.5976034257539806 361 200 200
mbpp-ada-keep-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 100 0.7402170888358616 361 200 200
mbpp-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 1 0.35779778393351797 361 200 200
mbpp-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 10 0.5974924109981751 361 200 200
mbpp-ada-reworded-Qwen_Qwen2.5_Coder_7B_Instruct-0.8-reworded 100 0.7409462833345465 361 200 200

Please let me know if you have any feedback, as we'd love to see support for Ada included in the project. Thanks :)

@arjunguha
Copy link
Member

Thanks for this! Just so we're on the same page:

You're adding support for sets, which we had omitted since they didn't appear in HumanEval. But, I see that there are two MBPP problems that require them:

arjun@arjun-laptop datasets % pwd
/Users/arjun/repos/nuprl/MultiPL-E/datasets
arjun@arjun-laptop datasets % grep -F "Set[" */*.py
mbpp-typed/mbpp_473_tuple_intersection.py:def tuple_intersection(test_list1: List[Tuple[int, int]], test_list2: List[Tuple[int, int]]) -> Set[Tuple[int, int]]:
mbpp-typed/mbpp_582_my_dict.py:def my_dict(dict1: Set[int]) -> bool:

All the other translators should be updated to support sets if they want to support these two problems.

@rowan-walshe
Copy link
Contributor Author

Yes, other translators would need to be updated if they want to support these two problems. While I have defined a get_set method for translators that extend the LanguageTranslator class, I didn't complete any of the implementations. If you want, I can have a go at doing this, though I have little to no experience with many of the languages supported by MultiPL-E.

Just to confirm though, the Set related changes do not negatively impact the translation ratios for any of the existing translators. The only difference is the exception that is raised for those two problems.

Previously the exceptions raised looked like:

...
 File ".../projects/ai/MultiPL-E/dataset_builder/generic_translator.py", line 44, in translate_expr
 raise Exception(f"Unhandled expression: {py_expr}")
Exception: Unhandled expression: <ast.Set object at 0x1054e5750>

Now the exceptions will look like:

...
 File ".../MultiPL-E/dataset_builder/generic_translator.py", line 31, in translate_expr
 return translator.gen_set([translate_expr(translator, e) for e in elts])
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File ".../MultiPL-E/dataset_builder/humaneval_to_py.py", line 103, in gen_set
    raise NotImplementedError("This translator does not currently support translating sets")
NotImplementedError: This translator does not currently support translating sets

or

...
 File ".../MultiPL-E/dataset_builder/generic_translator.py", line 31, in translate_expr
 return translator.gen_set([translate_expr(translator, e) for e in elts])
 ^^^^^^^^^^^^^^^^^^
AttributeError: 'Translator' object has no attribute 'gen_set'. Did you mean: 'gen_dict'?

Note also that while MBPP 473 is a well-formed problem, MBPP 582 needs a couple of minor changes. Its current signature is:

def my_dict(dict1: Set[int]) -> bool:
    """
    Write a function to check if a dictionary is empty
    """
    ...

So the function name and docstring suggest that it takes a dictionary, but it's currently typed to take a set. Then two test cases pass a set, while the third is a dictionary. I'm not entirely sure which combination of typehint, testcase, and docstring changes should be made, but I am confident that it probably should be updated.

Add pass@1 metric to pass_k.py

Update pass_k.py to load the results file from .gz or .json

Added basic support for Sets, enabling the translation of
mbpp_473_tuple_intersection.py

Co-authored-by: Rowan Walshe <[email protected]>
Co-authored-by: Fabien Chouteau <[email protected]>
@rowan-walshe rowan-walshe force-pushed the topic/Add-initial-support-for-ada branch from 48a408e to 8a94612 Compare November 22, 2024 23:32
@arjunguha
Copy link
Member

Thanks! Yeah, some of these problems are a mess. We have generally erred on the side of letting problems be faulty, if the fault was in the original Python problem, but fixing problems in the translators.

See EvalPlus (HumanEvalPlus) for a project that actually fixes the faults in the original Python problems.

Anyway, I'll have this merged within this week.

The container for execution is getting very large. I may create a new container for Ada (and do so for other PLs going forward).

@arjunguha
Copy link
Member

I tuned this out over the break. But, I'll get to this this week. Thanks for your patience.

@arjunguha arjunguha merged commit fbee4e2 into nuprl:main Jan 6, 2025
@arjunguha
Copy link
Member

I've merged this in and updated the dataset README on the Hub:

https://huggingface.co/datasets/nuprl/MultiPL-E

I built a separate evaluation container for Ada, which is also pushed to the GitHub Container Registry:

3d11fa3

I do need to document that this container is here. I'll get to that next, and I do want to update the rather unwieldly directions for supporting a new language.

@arjunguha
Copy link
Member

Also, here is an Ada result on MultiPL-HumanEval with Llama 3.1 8b:

11.3% at temperature 0.2 (just 20 completions, so not totally stable, but very close to the true value)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants