Overview

This repository exists for the purposes of reproducibility. In addition to all of the source code, there are a few portions we believe would provide the most value.

GUI Element Deteciton Model

The best GUI element detection model we trained can be found at bounding_boxes/outputs/comcrawl_200k_lr_0.0025_unfrozen/model_final.pth

Experimental Settings

All experimental settings can be found in the experiments folder

After Setup, experiments can be run with (replace with your experiment):

python3 main.py --config_file experiments/final_omniact_gemini_gt_bbox_gt_order_all.yaml

More examples can be found in run_local_scripts.sh

Prompts

All prompts can be found at: prompt_generators/example_prompts/

Setup

conda create -n llm-agent python=3.10 -y
conda activate llm-agent

Add to PYTHONPATH

export PYTHONPATH='~/gui-agent/':$PYTHONPATH
source ~/.bash_profile

Linux

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia -y

Mac

conda install pytorch torchvision torchaudio cpuonly -c pytorch

Requirements

pip install -r requirements.txt
pip install git+https://github.com/facebookresearch/segment-anything.git transformers einops 
# For webarena
pip install --upgrade google-cloud-aiplatform

Install Decatron

git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2

VisualWebArena

Follow the instructions at https://github.com/web-arena-x/visualwebarena. This is necessary for the agent to run due to imports.

cd visualwebarena
pip install -r requirements.txt
playwright install
pip install -e .

Change the web_arena_prepare.sh file to map to your VWA install directory. Run it. This is only necessary for VWA experiments.

Note that due to differing OpenAI versions, you may have to go into webarena and manually fix the outdated APIs. The error classes will look something like openai.****.RateLimitingError. Change it to openai.RateLimitngError

Gcloud

If you're using Gemini, set up your GCP.

gcloud auth login
gcloud config set project YOUR-PROJECT
gcloud auth application-default login

Run this

import nltk nltk.download('punkt')

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
actions		actions
agents		agents
bounding_boxes		bounding_boxes
environments		environments
evaluation		evaluation
experiments		experiments
pipeline		pipeline
prompt_generators		prompt_generators
response_parsers		response_parsers
scripts		scripts
tests		tests
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SourceCodePro-Semibold.ttf		SourceCodePro-Semibold.ttf
definitions.py		definitions.py
environment.yml		environment.yml
main.py		main.py
run_local_scripts.sh		run_local_scripts.sh
web_arena_prepare.sh		web_arena_prepare.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

GUI Element Deteciton Model

Experimental Settings

Prompts

Setup

Add to PYTHONPATH

Linux

Mac

Requirements

Install Decatron

VisualWebArena

Gcloud

Run this

About

Releases

Packages

Languages

License

waynchi/gui-agent

Folders and files

Latest commit

History

Repository files navigation

Overview

GUI Element Deteciton Model

Experimental Settings

Prompts

Setup

Add to PYTHONPATH

Linux

Mac

Requirements

Install Decatron

VisualWebArena

Gcloud

Run this

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages