-
Notifications
You must be signed in to change notification settings - Fork 47
feat: tool calling benchmark unified across types and prompts variety #620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 27 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
bd7ccc2
feat: add new levels for prompt sand system prompts
jmatejcz 71e6b71
feat: adjust basic taks to new levels
jmatejcz b5f115f
feat: manipulation tasks adjusted to new levels
jmatejcz 7bb77c8
feat: adjust navigation tasks to new levels
jmatejcz 295f18e
refactor: grouped args to task into pydantic model
jmatejcz e8f1631
feat: adjust custom interfaces tasks to new levels
jmatejcz 81983e4
feat: adjust spatiial tasks to new levels
jmatejcz 2690ed6
feat: merged topics mocks in basic tasks
jmatejcz 4198d6d
feat: adjusted examples and result saving
jmatejcz 08be22a
feat: adjusted visualisation to new levels
jmatejcz 99b5c77
feat: seperate file for mocks, merged mocks from different types
jmatejcz 74b8ec2
feat: defined more basic tasks
jmatejcz e3ae1e5
refactor: splitted predefined tasks into files
jmatejcz ee81ec0
feat: extra tool calls as list
jmatejcz eb90175
docs: adjusted docs to new changes
jmatejcz 70cd604
style: format changes
jmatejcz 0c8cb9d
chore: reduce the computation in example benchamrking
jmatejcz 0b1406f
feat: task prompts more like guidance
jmatejcz ff851f2
feat: added Task's base prompt for result processing
jmatejcz b4c9e9e
feat: saving base prompt to results
jmatejcz 14d4f9d
fix: labels in task plots
jmatejcz d532672
fix: passing prompt levels from user
jmatejcz 7d01b2b
style: adjust docs tutorial
jmatejcz 761ba5c
chore: version bump
jmatejcz d0a2c5f
docs: typos in docs
jmatejcz 67d2268
refactor: removed dupicalte check
jmatejcz 308c3ce
style: change config name
jmatejcz c03f1cf
refactor: removed moredate level of prompt detail
jmatejcz 8dae02c
docs: added docstings
jmatejcz 80e5ad2
docs: added examples and more descriptions to docs
jmatejcz e9ea171
docs: linked the main ToolCallingAgentBenchmarkConfig docstring in o…
jmatejcz 3e64387
docs: updated docs and linked
jmatejcz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
[tool.poetry] | ||
name = "rai-bench" | ||
version = "0.1.0" | ||
version = "0.2.0" | ||
description = "Package for running and creating benchmarks." | ||
authors = ["Jakub Matejczyk <[email protected]>", "Magdalena Kotynia <[email protected]>"] | ||
readme = "README.md" | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.