-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BioCoder integration #2076
BioCoder integration #2076
Conversation
Description:This pull request is to introduce an evaluation on the BioCoder dataset using agents on OpenDevin. The Biocoder benchmark aims to assess the performance of LLMs on entire repositories related to bioinformatics. It is a challenging full-repository level benchmark that requires both understanding of long-range, inter-class dependencies and domain knowledge. By evaluating this dataset, we can gain further insights on the performance of LLMs on closed domain (bioinformatics) tasks, as well as their ability to retrieve information and relevant code from a large context window, or the entire repository. Key changes:Added Updated Related papers:BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models (https://arxiv.org/pdf/2308.16458) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am little confuse What is BiocoderSSHBox
for? It seems not called in run_infer.py
.
evaluation/biocoder/README.md
Outdated
## Configure OpenDevin and your LLM | ||
|
||
|
||
## Run Inference | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more details about how to run your benchamrk and also add run_infer.sh
?
def get_box_for_instance( | ||
cls, | ||
instance, | ||
n_tries=5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that n_tries
is not being used. If you need to retry, you can use the retry decorator from the tenacity library, like
@retry(stop=stop_after_attempt(5), wait=wait_fixed(5))
@tangxiangru and @lilbillybiscuit , thanks a lot for contributing this! Could you ping us again when the PR is finished and ready for review (including the documentation of how to run)? I'll change it to a draft PR in the meantime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tangxiangru , @lilbillybiscuit, and @li-boxuan -- now that we're done with the main evals for the paper it'd be great to get this merged if you have a moment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tangxiangru @lilbillybiscuit I try to help you fix some comments and add shell script and readme file. But when I try to run your code locally, I actually meet some problem around the instance
and BiocoderData
. When I fix one, there are some follow up problems. So I guess you have not push the final version evalution code here.
So please ping me to review when you finish the code. Thanks.
Hi @yufansong and @tangxiangru , I am sorry for the delay, but the final version has been pushed. I am writing a readme but the script can be run with the same arguments as SWE-bench. I am currently writing a more polished readme and it will be pushed shortly. To answer the previous question, BiocoderSSHBox is a modified version of SWEBenchSSHBox that runs our docker image and all the dependencies (such as fetching the repository archive, testing caches, etc.) |
Integrates BioCoder from https://arxiv.org/abs/2308.16458