AI Toolkit for VS Code makes developing generative AI applications easier by combining advanced AI tools and models from Azure AI Foundry Catalog and other sources like Hugging Face. You can browse AI model catalogs powered by GitHub Models and Azure AI Foundry Model Catalogs, download them locally or remotely, fine-tune them, test them, and use them in your applications.
AI Toolkit Preview operates locally. Depending on the model you choose, local inference or fine-tuning may require a GPU like an NVIDIA CUDA GPU. You can also directly run GitHub Models using AITK.
Learn how to install Windows subsystem for Linux
and change the default distribution.
-
Windows, Linux, macOS
-
For fine-tuning on both Windows and Linux, an Nvidia GPU is required. Additionally, Windows needs the subsystem for Linux with Ubuntu distro 18.4 or newer. Learn how to install Windows subsystem for Linux and change the default distribution.
AI Toolkit is distributed as a Visual Studio Code Extension, so you first need to install VS Code and then download AI Toolkit from the VS Marketplace.
The AI Toolkit is available in the Visual Studio Marketplace and can be installed like any other VS Code extension.
If you are new to installing VS Code extensions, follow these steps:
- In VS Code, select Extensions from the Activity Bar.
- In the Extensions Search bar, type "AI Toolkit."
- Select "AI Toolkit for Visual Studio Code."
- Click Install.
Now you’re ready to use the extension!
You’ll be prompted to sign in to GitHub. Click "Allow" to continue. This will redirect you to the GitHub sign-in page.
Sign in and follow the steps provided. Once completed successfully, you’ll be redirected back to VS Code.
After the extension is installed, the AI Toolkit icon will appear in your Activity Bar.
Let’s explore the available actions!
The main sidebar of AI Toolkit is divided into the following sections:
- Models
- Resources
- Playground
- Fine-tuning
- Evaluation
These can be found in the Resources section. To begin, select Model Catalog.
When you launch AI Toolkit from the VS Code sidebar, you’ll see the following options:
- Choose a supported model from Model Catalog and download it locally.
- Test model inference in the Model Playground.
- Fine-tune the model locally or remotely in Model Fine-tuning.
- Deploy fine-tuned models to the cloud via the AI Toolkit command palette.
- Evaluate models.
Note
GPU vs CPU
Model cards will display the model size, platform, and accelerator type (CPU, GPU). For optimal performance on Windows devices with at least one GPU, select model versions that are specifically designed for Windows.
This ensures the model is optimized for the DirectML accelerator.
Model names follow the format:
{model_name}-{accelerator}-{quantization}-{format}
.
To check if your Windows device has a GPU, open Task Manager and go to the Performance tab. If you have GPU(s), they will be listed under names like "GPU 0" or "GPU 1."
After setting all parameters, click Generate Project.
Once the model is downloaded, click Load in Playground on the model card in the catalog:
- Start the model download.
- Install all necessary prerequisites and dependencies.
- Create a VS Code workspace.
AI Toolkit provides a local REST API web server on port 5272 that uses the OpenAI chat completions format.
This allows you to test your application locally without depending on a cloud AI model service. For example, the following JSON file illustrates how to configure the request body:
{
"model": "Phi-4",
"messages": [
{
"role": "user",
"content": "what is the golden ratio?"
}
],
"temperature": 0.7,
"top_p": 1,
"top_k": 10,
"max_tokens": 100,
"stream": true
}
You can test the REST API using tools like Postman or the CURL (Client URL) utility:
curl -vX POST http://127.0.0.1:5272/v1/chat/completions -H 'Content-Type: application/json' -d @body.json
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:5272/v1/",
api_key="x" # required for the API but not used
)
chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": "what is the golden ratio?",
}
],
model="Phi-4",
)
print(chat_completion.choices[0].message.content)
Add the Azure OpenAI client library for .NET to your project using NuGet:
dotnet add {project_name} package Azure.AI.OpenAI --version 1.0.0-beta.17
Add a C# file named OverridePolicy.cs to your project and paste the following code:
// OverridePolicy.cs
using Azure.Core.Pipeline;
using Azure.Core;
internal partial class OverrideRequestUriPolicy(Uri overrideUri)
: HttpPipelineSynchronousPolicy
{
private readonly Uri _overrideUri = overrideUri;
public override void OnSendingRequest(HttpMessage message)
{
message.Request.Uri.Reset(_overrideUri);
}
}
Next, paste the following code into your Program.cs file:
// Program.cs
using Azure.AI.OpenAI;
Uri localhostUri = new("http://localhost:5272/v1/chat/completions");
OpenAIClientOptions clientOptions = new();
clientOptions.AddPolicy(
new OverrideRequestUriPolicy(localhostUri),
Azure.Core.HttpPipelinePosition.BeforeTransport);
OpenAIClient client = new(openAIApiKey: "unused", clientOptions);
ChatCompletionsOptions options = new()
{
DeploymentName = "Phi-4",
Messages =
{
new ChatRequestSystemMessage("You are a helpful assistant. Be brief and succinct."),
new ChatRequestUserMessage("What is the golden ratio?"),
}
};
StreamingResponse<StreamingChatCompletionsUpdate> streamingChatResponse
= await client.GetChatCompletionsStreamingAsync(options);
await foreach (StreamingChatCompletionsUpdate chatChunk in streamingChatResponse)
{
Console.Write(chatChunk.ContentUpdate);
}
- Begin with model discovery and the playground.
- Perform model fine-tuning and inference using local computing resources.
- Carry out remote fine-tuning and inference using Azure resources.
Visit our Q&A page for answers to common issues and solutions.
It seems like "mo" might refer to a specific language or dialect. Could you clarify what "mo" stands for? For example, is it Māori, Mon (spoken in Myanmar and Thailand), or another language?