Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 3414f18

Browse files
authored
Merge pull request #1818 from janhq/chore/v1.0.5
Sync dev to main 1.0.5
2 parents 4fe1603 + 19939de commit 3414f18

File tree

194 files changed

+16140
-3745
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

194 files changed

+16140
-3745
lines changed

.github/workflows/cortex-cpp-quality-gate.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
ccache-dir: ""
3535
- os: "mac"
3636
name: "arm64"
37-
runs-on: "macos-silicon"
37+
runs-on: "macos-selfhosted-12-arm64"
3838
cmake-flags: "-DCORTEX_CPP_VERSION=${{github.event.pull_request.head.sha}} -DCMAKE_BUILD_TEST=ON -DMAC_ARM64=ON -DCMAKE_TOOLCHAIN_FILE=vcpkg/scripts/buildsystems/vcpkg.cmake"
3939
build-deps-cmake-flags: ""
4040
ccache-dir: ""
@@ -124,7 +124,7 @@ jobs:
124124
cat ~/.cortexrc
125125
126126
- name: Run e2e tests
127-
if: runner.os != 'Windows' && github.event.pull_request.draft == false
127+
if: github.event_name != 'schedule' && runner.os != 'Windows' && github.event.pull_request.draft == false
128128
run: |
129129
cd engine
130130
cp build/cortex build/cortex-nightly
@@ -138,7 +138,7 @@ jobs:
138138
GITHUB_TOKEN: ${{ secrets.PAT_SERVICE_ACCOUNT }}
139139

140140
- name: Run e2e tests
141-
if: runner.os == 'Windows' && github.event.pull_request.draft == false
141+
if: github.event_name != 'schedule' && runner.os == 'Windows' && github.event.pull_request.draft == false
142142
run: |
143143
cd engine
144144
cp build/cortex.exe build/cortex-nightly.exe

.github/workflows/template-build-macos.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ jobs:
8282
matrix:
8383
include:
8484
- arch: 'arm64'
85-
runs-on: 'macos-silicon'
85+
runs-on: 'macos-selfhosted-12-arm64'
8686
extra-cmake-flags: "-DMAC_ARM64=ON"
8787

8888
- arch: 'amd64'

docker/entrypoint.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ echo "enableCors: true" >> /root/.cortexrc
77

88
# Install the engine
99
cortex engines install llama-cpp -s /opt/cortex.llamacpp
10-
cortex engines list
1110

1211
# Start the cortex server
1312
cortex start
13+
cortex engines list
1414

1515
# Keep the container running by tailing the log files
1616
tail -f /root/cortexcpp/logs/cortex.log &

docs/docs/cli/models/index.mdx

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -120,8 +120,11 @@ For example, it returns the following:w
120120

121121
| Option | Description | Required | Default value | Example |
122122
|---------------------------|----------------------------------------------------|----------|---------------|----------------------|
123-
| `-h`, `--help` | Display help for command. | No | - | `-h` |
124-
<!-- | `-f`, `--format <format>` | Specify output format for the models list. | No | `json` | `-f json` | -->
123+
| `-h`, `--help` | Display help for command. | No | - | `-h` |
124+
| `-e`, `--engine` | Display engines. | No | - | `--engine` |
125+
| `-v`, `--version` | Display version for model. | No | - | `--version` |
126+
| `--cpu_mode` | Display CPU mode. | No | - | `--cpu_mode` |
127+
| `--gpu_mode` | Display GPU mode. | No | - | `--gpu_mode` |
125128

126129
## `cortex models start`
127130
:::info
@@ -156,9 +159,10 @@ This command uses a `model_id` from the model that you have downloaded or availa
156159

157160
| Option | Description | Required | Default value | Example |
158161
|---------------------------|---------------------------------------------------------------------------|----------|----------------------------------------------|------------------------|
159-
| `model_id` | The identifier of the model you want to start. | Yes | `Prompt to select from the available models` | `mistral` |
160-
| `--gpus` | List of GPUs to use. | No | - | `[0,1]` |
161-
| `-h`, `--help` | Display help information for the command. | No | - | `-h` |
162+
| `model_id` | The identifier of the model you want to start. | Yes | `Prompt to select from the available models` | `mistral` |
163+
| `--gpus` | List of GPUs to use. | No | - | `[0,1]` |
164+
| `--ctx_len` | Maximum context length for inference. | No | `min(8192, max_model_context_length)` | `1024` |
165+
| `-h`, `--help` | Display help information for the command. | No | - | `-h` |
162166

163167
## `cortex models stop`
164168
:::info

docs/docs/cli/models/start.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ cortex models start [model_id]:[engine] [options]
3333
|---------------------------|----------------------------------------------------------|----------|----------------------------------------------|-------------------|
3434
| `model_id` | The identifier of the model you want to start. | No | `Prompt to select from the available models` | `mistral` |
3535
| `--gpus` | List of GPUs to use. | No | - | `[0,1]` |
36+
| `--ctx_len` | Maximum context length for inference. | No | `min(8192, max_model_context_length)` | `1024` |
3637
| `-h`, `--help` | Display help information for the command. | No | - | `-h` |
3738

3839

docs/docs/cli/run.mdx

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,8 @@ You can use the `--verbose` flag to display more detailed output of the internal
3636

3737
| Option | Description | Required | Default value | Example |
3838
|-----------------------------|-----------------------------------------------------------------------------|----------|----------------------------------------------|------------------------|
39-
| `model_id` | The identifier of the model you want to chat with. | Yes | - | `mistral` |
40-
| `--gpus` | List of GPUs to use. | No | - | `[0,1]` |
39+
| `model_id` | The identifier of the model you want to chat with. | Yes | - | `mistral` |
40+
| `--gpus` | List of GPUs to use. | No | - | `[0,1]` |
41+
| `--ctx_len` | Maximum context length for inference. | No | `min(8192, max_model_context_length)` | `1024` |
4142
| `-h`, `--help` | Display help information for the command. | No | - | `-h` |
4243
<!-- | `-t`, `--thread <thread_id>` | Specify the Thread ID. Defaults to creating a new thread if none specified. | No | - | `-t jan_1717650808` | | `-c` | -->
Lines changed: 178 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,89 +1,210 @@
11
---
2-
title: Building Engine Extensions
2+
title: Adding a Third-Party Engine to Cortex
33
description: Cortex supports Engine Extensions to integrate both :ocal inference engines, and Remote APIs.
44
---
55

6-
:::info
7-
🚧 Cortex is currently under development, and this page is a stub for future development.
8-
:::
9-
10-
<!--
11-
import Tabs from "@theme/Tabs";
12-
import TabItem from "@theme/TabItem";
13-
146
:::warning
157
🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
168
:::
179

10+
# Guide to Adding a Third-Party Engine to Cortex
11+
12+
## Introduction
13+
14+
This guide outlines the steps to integrate a custom engine with Cortex. We hope this helps developers understand the integration process.
15+
16+
## Implementation Steps
17+
18+
### 1. Implement the Engine Interface
19+
20+
First, create an engine that implements the `EngineI.h` interface. Here's the interface definition:
21+
22+
```cpp
23+
class EngineI {
24+
public:
25+
struct RegisterLibraryOption {
26+
std::vector<std::filesystem::path> paths;
27+
};
28+
29+
struct EngineLoadOption {
30+
// engine
31+
std::filesystem::path engine_path;
32+
std::filesystem::path cuda_path;
33+
bool custom_engine_path;
34+
35+
// logging
36+
std::filesystem::path log_path;
37+
int max_log_lines;
38+
trantor::Logger::LogLevel log_level;
39+
};
40+
41+
struct EngineUnloadOption {
42+
bool unload_dll;
43+
};
44+
45+
virtual ~EngineI() {}
1846

19-
This document provides a step-by-step guide to adding a new engine to the Cortex codebase, similar to the `OpenAIEngineExtension`.
47+
virtual void RegisterLibraryPath(RegisterLibraryOption opts) = 0;
2048

49+
virtual void Load(EngineLoadOption opts) = 0;
2150

22-
## Integrate a New Remote Engine
51+
virtual void Unload(EngineUnloadOption opts) = 0;
2352

24-
### Step 1: Create the New Engine Extension
53+
// Cortex.llamacpp interface methods
54+
virtual void HandleChatCompletion(
55+
std::shared_ptr<Json::Value> json_body,
56+
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;
2557

26-
1. Navigate to the `cortex-js/src/extensions` directory.
27-
2. Create a new file named `<new-engine>.engine.ts` (replace `<new-engine>` with the name of your engine).
28-
3. Implement your new engine extension class using the following template:
58+
virtual void HandleEmbedding(
59+
std::shared_ptr<Json::Value> json_body,
60+
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;
2961

30-
```typescript
31-
class <NewEngine>EngineExtension extends OAIEngineExtension {
32-
apiUrl = 'https://api.<new-engine>.com/v1/chat/completions';
33-
name = '<new-engine>';
34-
productName = '<New Engine> Inference Engine';
35-
description = 'This extension enables <New Engine> chat completion API calls';
36-
version = '0.0.1';
37-
apiKey?: string;
38-
}
62+
virtual void LoadModel(
63+
std::shared_ptr<Json::Value> json_body,
64+
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;
65+
66+
virtual void UnloadModel(
67+
std::shared_ptr<Json::Value> json_body,
68+
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;
69+
70+
virtual void GetModelStatus(
71+
std::shared_ptr<Json::Value> json_body,
72+
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;
73+
74+
// Compatibility and model management
75+
virtual bool IsSupported(const std::string& f) = 0;
76+
77+
virtual void GetModels(
78+
std::shared_ptr<Json::Value> jsonBody,
79+
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;
80+
81+
// Logging configuration
82+
virtual bool SetFileLogger(int max_log_lines,
83+
const std::string& log_path) = 0;
84+
virtual void SetLogLevel(trantor::Logger::LogLevel logLevel) = 0;
85+
};
3986
```
4087
41-
:::info
42-
Be sure to replace all placeholders with the appropriate values for your engine.
43-
:::
88+
#### Lifecycle Management
89+
90+
##### RegisterLibraryPath
91+
92+
```cpp
93+
virtual void RegisterLibraryPath(RegisterLibraryOption opts) = 0;
94+
```
95+
96+
This method is called during engine initialization to set up dynamic library search paths. For example, in Linux, we still have to use `LD_LIBRARY_PATH` to add CUDA dependencies to the search path.
97+
98+
**Parameters:**
99+
100+
- `opts.paths`: Vector of filesystem paths that the engine should register
44101
45-
### Step 2: Register the New Engine
102+
**Implementation Requirements:**
46103
47-
1. Open the `extensions.module.ts` located at `cortex-js/src/extensions/`.
104+
- Register provided paths for dynamic library loading
105+
- Handle invalid paths gracefully
106+
- Thread-safe implementation
107+
- No exceptions should escape the method
48108
49-
2. Register your new engine in the provider array using the following code:
109+
##### Load
50110
51-
```typescript
52-
[
53-
new OpenAIEngineExtension(httpService, configUsecases, eventEmitter),
54-
//... other remote engines
55-
new <NewEngine>EngineExtension(httpService, configUsecases, eventEmitter),
56-
]
111+
```cpp
112+
virtual void Load(EngineLoadOption opts) = 0;
57113
```
58114
59-
## Explanation of Key Properties and Methods
60-
| **Value** | **Description** |
61-
|------------------------------------|--------------------------------------------------------------------------------------------------|
62-
| `apiUrl` | This is the URL endpoint for the new engine's API. It is used to make chat completion requests. |
63-
| `name` | This is a unique identifier for the engine. It is used internally to reference the engine. |
64-
| `productName` | This is a human-readable name for the engine. It is used for display purposes. |
65-
| `description` | This provides a brief description of what the engine does. It is used for documentation and display purposes. |
66-
| `version` | This indicates the version of the engine extension. It is used for version control and display purposes. |
67-
| `eventEmmitter.on('config.updated')` | This is an event listener that listens for configuration updates. When the configuration for the engine is updated, this listener updates the `apiKey` and the engine's status. |
68-
| `onLoad` | This method is called when the engine extension is loaded. It retrieves the engine's configuration (such as the `apiKey`) and sets the engine's status based on whether the `apiKey` is available. |
115+
Initializes the engine with the provided configuration options.
69116
70-
## Advanced: Transforming Payloads and Responses
117+
**Parameters:**
71118
72-
Some engines require custom transformations for the payload sent to the API and the response received from the API. This is achieved using the `transformPayload` and `transformResponse` methods. These methods allow you to modify the data structure to match the specific requirements of the engine.
119+
- `engine_path`: Base path for engine files
120+
- `cuda_path`: Path to CUDA installation
121+
- `custom_engine_path`: Flag for using custom engine location
122+
- `log_path`: Location for log files
123+
- `max_log_lines`: Maximum number of lines per log file
124+
- `log_level`: Logging verbosity level
73125
74-
### `transformPayload`
126+
**Implementation Requirements:**
127+
128+
- Validate all paths before use
129+
- Initialize engine components
130+
- Set up logging configuration
131+
- Handle missing dependencies gracefully
132+
- Clean initialization state in case of failures
133+
134+
##### Unload
135+
136+
```cpp
137+
virtual void Unload(EngineUnloadOption opts) = 0;
138+
```
139+
140+
Performs cleanup and shutdown of the engine.
141+
142+
**Parameters:**
143+
144+
- `unload_dll`: Boolean flag indicating whether to unload dynamic libraries
145+
146+
**Implementation Requirements:**
147+
148+
- Clean up all allocated resources
149+
- Close file handles and connections
150+
- Release memory
151+
- Ensure proper shutdown of running models
152+
- Handle cleanup in a thread-safe manner
153+
154+
### 2. Create a Dynamic Library
155+
156+
We recommend using the [dylib library](https://github.com/martin-olivier/dylib) to build your dynamic library. This library provides helpful tools for creating cross-platform dynamic libraries.
157+
158+
### 3. Package Dependencies
159+
160+
Please ensure all dependencies are included with your dynamic library. This allows us to create a single, self-contained package for distribution.
161+
162+
### 4. Publication and Integration
163+
164+
#### 4.1 Publishing Your Engine (Optional)
165+
166+
If you wish to make your engine publicly available, you can publish it through GitHub. For reference, examine the [cortex.llamacpp releases](https://github.com/janhq/cortex.llamacpp/releases) structure:
167+
168+
- Each release tag should represent your version
169+
- Include all variants within the same release
170+
- Cortex will automatically select the most suitable variant or allow users to specify their preferred variant
171+
172+
#### 4.2 Integration with Cortex
173+
174+
Once your engine is ready, we encourage you to:
175+
176+
1. Notify the Cortex team about your engine for potential inclusion in our default supported engines list
177+
2. Allow us to help test and validate your implementation
178+
179+
### 5. Local Testing Guide
180+
181+
To test your engine locally:
182+
183+
1. Create a directory structure following this hierarchy:
184+
185+
```bash
186+
engines/
187+
└── cortex.llamacpp/
188+
└── mac-arm64/
189+
└── v0.1.40/
190+
├── libengine.dylib
191+
└── version.txt
192+
```
75193
76-
The `transformPayload` method is used to transform the data before sending it to the engine's API. This method takes the original payload and modifies it as needed.
194+
1. Configure your engine:
77195
78-
**Example: Anthropic Engine**
196+
- Edit the `~/.cortexrc` file to register your engine name
197+
- Add your model with the appropriate engine field in `model.yaml`
79198
80-
In the Anthropic Engine, the `transformPayload` method extracts the system message and other messages, and includes additional parameters like `model`, `stream`, and `max_tokens`.
199+
2. Testing:
200+
- Start the engine
201+
- Load your model
202+
- Verify functionality
81203
82-
### `transformResponse`
204+
## Future Development
83205
84-
The `transformResponse` method is used to transform the data received from the engine's API. This method processes the response and converts it into a format that the application can use.
206+
We're currently working on expanding support for additional release sources to make distribution more flexible.
85207
86-
**Example: Anthropic Engine**
208+
## Contributing
87209
88-
In the Anthropic Engine, the `transformResponse` method handles both stream and non-stream responses. It processes the response data and converts it into a standardized format.
89-
-->
210+
We welcome suggestions and contributions to improve this integration process. Please feel free to submit issues or pull requests through our repository.

0 commit comments

Comments
 (0)