Skip to content

microsoft/UFO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

1d9d666 · Apr 22, 2025
Jun 27, 2024
Apr 19, 2025
Feb 25, 2025
Apr 22, 2025
Apr 20, 2025
Dec 20, 2024
Jan 15, 2025
Apr 22, 2025
Mar 27, 2024
Jan 13, 2025
Jan 8, 2024
Jul 6, 2024
Feb 8, 2024
Feb 6, 2024
Apr 22, 2025
Apr 19, 2025
Jan 8, 2024
Apr 22, 2025

Repository files navigation

UFO² UFO logo : The Desktop AgentOS

Turn natural‑language requests into automatic, reliable, multi‑application workflows on Windows, beyond UI-Focused.

arxivPython VersionLicense: MITDocumentationYouTube


✨ Key Capabilities

Deep OS Integration Picture‑in‑Picture Desktop (coming soon) Hybrid GUI + API Actions
Combines Windows UIA, Win32 and WinCOM for first‑class control detection and native commands. Automation runs in a sandboxed virtual desktop so you can keep using your main screen. Chooses native APIs when available, falls back to clicks/keystrokes when not—fast and robust.
Speculative Multi‑Action Continuous Knowledge Substrate UIA + Visual Control Detection
Bundles several predicted steps into one LLM call, validated live—up to 51 % fewer queries. Mixes docs, Bing search, user demos and execution traces via RAG for agents that learn over time. Detects standard and custom controls with a hybrid UIA + vision pipeline.

See the documentation for full details.


📢 News

  • 📅 2025-04-19: Version v2.0.0 Released! We’re excited to announce the release the UFO²! UFO² is a major upgrade to the original UFO, featuring with enhanced capabilities. It introduces the AgentOS concept, enabling seamless integration of multiple agents for complex tasks. Please check our new technical report for more details.
  • 📅 ...
  • 📅 2024-02-14: Our technical report for UFO is online!
  • 📅 2024-02-10: The first version of UFO is released on GitHub🎈. Happy Chinese New year🐉!

🏗️ Architecture overview

UFO² architecture

UFO² operates as a Desktop AgentOS, encompassing a multi-agent framework that includes:

  1. HostAgent – Parses the natural‑language goal, launches the necessary applications, spins up / coordinates AppAgents, and steers a global finite‑state machine (FSM).
  2. AppAgents – One per application; each runs a ReAct loop with multimodal perception, hybrid control detection, retrieval‑augmented knowledge, and the Puppeteer executor that chooses between GUI actions and native APIs.
  3. Knowledge Substrate – Blends offline documentation, online search, demonstrations, and execution traces into a vector store that is retrieved on‑the‑fly at inference.
  4. Speculative Executor – Slashes LLM latency by predicting batches of likely actions and validating them against live UIA state in a single shot.
  5. Picture‑in‑Picture Desktop (coming soon) – Runs the agent in an isolated virtual desktop so your main workspace and input devices remain untouched.

For a deep dive see our technical report or the docs site.


🌐 Media Coverage

UFO sightings have garnered attention from various media outlets, including:

These sources provide insights into the evolving landscape of technology and the implications of UFO phenomena on various platforms.


🚀 Three‑minute Quickstart

🛠️ Step 1: Installation

UFO requires Python >= 3.10 running on Windows OS >= 10. It can be installed by running the following command:

# [optional to create conda environment]
# conda create -n ufo python=3.10
# conda activate ufo

# clone the repository
git clone https://github.com/microsoft/UFO.git
cd UFO
# install the requirements
pip install -r requirements.txt
# If you want to use the Qwen as your LLMs, uncomment the related libs.

⚙️ Step 2: Configure the LLMs

Before running UFO, you need to provide your LLM configurations individually for HostAgent and AppAgent. You can create your own config file ufo/config/config.yaml, by copying the ufo/config/config.yaml.template and editing config for HOST_AGENT and APP_AGENT as follows:

copy ufo\config\config.yaml.template ufo\config\config.yaml
notepad ufo\config\config.yaml   # paste your key & endpoint

OpenAI

VISUAL_MODE: True, # Whether to use the visual mode
API_TYPE: "openai" , # The API type, "openai" for the OpenAI API.  
API_BASE: "https://api.openai.com/v1/chat/completions", # The the OpenAI API endpoint.
API_KEY: "sk-",  # The OpenAI API key, begin with sk-
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview",  # The only OpenAI model

Azure OpenAI (AOAI)

VISUAL_MODE: True, # Whether to use the visual mode
API_TYPE: "aoai" , # The API type, "aoai" for the Azure OpenAI.  
API_BASE: "YOUR_ENDPOINT", #  The AOAI API address. Format: https://{your-resource-name}.openai.azure.com
API_KEY: "YOUR_KEY",  # The aoai API key
API_VERSION: "2024-02-15-preview", # "2024-02-15-preview" by default
API_MODEL: "gpt-4-vision-preview",  # The only OpenAI model
API_DEPLOYMENT_ID: "YOUR_AOAI_DEPLOYMENT", # The deployment id for the AOAI API

Need Qwen, Gemini, non‑visual GPT‑4, or even OpenAI CUA Operator as a AppAgent? See the model guide.

📔 Step 3: Additional Setting for RAG (optional).

If you want to enhance UFO's ability with external knowledge, you can optionally configure it with an external database for retrieval augmented generation (RAG) in the ufo/config/config.yaml file.

We provide the following options for RAG to enhance UFO's capabilities:

Consult their respective documentation for more information on how to configure these settings.

🎉 Step 4: Start UFO

⌨️ You can execute the following on your Windows command Line (CLI):

# assume you are in the cloned UFO folder
python -m ufo --task <your_task_name>

This will start the UFO process and you can interact with it through the command line interface. If everything goes well, you will see the following message:

Welcome to use UFO🛸, A UI-focused Agent for Windows OS Interaction. 
 _   _  _____   ___
| | | ||  ___| / _ \
| | | || |_   | | | |
| |_| ||  _|  | |_| |
 \___/ |_|     \___/
Please enter your request to be completed🛸:

Alternatively, you can also directly invoke UFO with a specific task and request by using the following command:

python -m ufo --task <your_task_name> -r "<your_request>"

Step 5 🎥: Execution Logs

You can find the screenshots taken and request & response logs in the following folder:

./ufo/logs/<your_task_name>/

You may use them to debug, replay, or analyze the agent output.

❓Get help

  • Please first check our our documentation here.
  • ❔GitHub Issues (prefered)
  • For other communications, please contact ufo-agent@microsoft.com.

📊 Evaluation

UFO² is rigorously benchmarked on two publicly‑available live‑task suites:

Benchmark Scope Documents
Windows Agent Arena (WAA) 154 real Windows tasks across 15 applications (Office, Edge, File Explorer, VS Code, …) https://microsoft.github.io/UFO/benchmark/windows_agent_arena/
OSWorld (Windows) 49 cross‑application tasks that mix Office 365, browser and system utilities https://microsoft.github.io/UFO/benchmark/osworld

The integration of these benchmarks into UFO² is in separate repositories. Please follow the above documents for more details.


📚 Citation

If you build on this work, please cite our the AgentOS framework:

UFO² – The Desktop AgentOS (2025)
https://arxiv.org/abs/2504.14603

@article{zhang2025ufo2,
  title   = {{UFO2: The Desktop AgentOS}},
  author  = {Zhang, Chaoyun and Huang, He and Ni, Chiming and Mu, Jian and Qin, Si and He, Shilin and Wang, Lu and Yang, Fangkai and Zhao, Pu and Du, Chao and Li, Liqun and Kang, Yu and Jiang, Zhao and Zheng, Suzhen and Wang, Rujia and Qian, Jiaxu and Ma, Minghua and Lou, Jian-Guang and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei},
  journal = {arXiv preprint arXiv:2504.14603},
  year    = {2025}
}

UFO – A UI‑Focused Agent for Windows OS Interaction (2024)
https://arxiv.org/abs/2402.07939

@article{zhang2024ufo,
  title   = {{UFO: A UI-Focused Agent for Windows OS Interaction}},
  author  = {Zhang, Chaoyun and Li, Liqun and He, Shilin and Zhang, Xu and Qiao, Bo and Qin, Si and Ma, Minghua and Kang, Yu and Lin, Qingwei and Rajmohan, Saravan and Zhang, Dongmei and Zhang, Qi},
  journal = {arXiv preprint arXiv:2402.07939},
  year    = {2024}
}

📝 Roadmap

The UFO² team is actively working on the following features and improvements:

  • Picture‑in‑Picture Mode – Completed and will be available in the next release
  • AgentOS‑as‑a‑Service – Completed and will be available in the next release
  • Auto‑Debugging Toolkit – Completed and will be available in the next release
  • Integration with MCP and Agent2Agent Communication – Planned; under implementation

🎨 Related Projects


⚠️ Disclaimer

By choosing to run the provided code, you acknowledge and agree to the following terms and conditions regarding the functionality and data handling practices in DISCLAIMER.md

logo Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.


⚖️ License

This repository is released under the MIT License (SPDX‑Identifier: MIT).
See DISCLAIMER.md for privacy & safety notices.


© Microsoft 2025 • UFO² is an open‑source project, not an official Windows feature.