Skip to content

Commit b569ba7

Browse files
xingyaowwlwaekfjlk
andauthored
docs: Add visualizer instruction for SWE-Bench (#2529)
* Update README.md for visualizer instruction * Polish the visualization guidance (#2531) * fix conda create error * fix and polish the readme for visualization * Update README.md --------- Co-authored-by: Haofei Yu <[email protected]>
1 parent 0a0f78f commit b569ba7

File tree

1 file changed

+27
-0
lines changed

1 file changed

+27
-0
lines changed

evaluation/swe_bench/README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,33 @@ The final results will be saved to `evaluation/evaluation_outputs/outputs/swe_be
154154
- `report.json`: a JSON file that contains keys like `"resolved"` pointing to instance IDs that are resolved by the agent.
155155
- `summary.json`: a JSON file contains more fine-grained information for each test instance.
156156

157+
## Visualize Results
158+
159+
First you need to clone `https://huggingface.co/spaces/OpenDevin/evaluation` and add your own running results from opendevin into the `outputs` of the cloned repo.
160+
161+
```bash
162+
git clone https://huggingface.co/spaces/OpenDevin/evaluation
163+
```
164+
165+
**(optional) setup streamlit environment with conda**:
166+
```bash
167+
conda create -n streamlit python=3.10
168+
conda activate streamlit
169+
pip install streamlit altair st_pages
170+
```
171+
172+
**run the visualizer**:
173+
Then, in a separate Python environment with `streamlit` library, you can run the following:
174+
175+
```bash
176+
# Make sure you are inside the cloned `evaluation` repo
177+
conda activate streamlit # if you follow the optional conda env setup above
178+
streamlit run 0_📊_OpenDevin_Benchmark.py --server.port 8501 --server.address 0.0.0.0
179+
```
180+
181+
Then you can access the SWE-Bench trajectory visualizer at `localhost:8501`.
182+
183+
157184

158185
## View Result Summary
159186

0 commit comments

Comments
 (0)