|
1 | 1 | ---
|
2 |
| -title: Building Engine Extensions |
| 2 | +title: Adding a Third-Party Engine to Cortex |
3 | 3 | description: Cortex supports Engine Extensions to integrate both :ocal inference engines, and Remote APIs.
|
4 | 4 | ---
|
5 | 5 |
|
6 |
| -:::info |
7 |
| -🚧 Cortex is currently under development, and this page is a stub for future development. |
8 |
| -::: |
9 |
| - |
10 |
| -<!-- |
11 |
| -import Tabs from "@theme/Tabs"; |
12 |
| -import TabItem from "@theme/TabItem"; |
13 |
| - |
14 | 6 | :::warning
|
15 | 7 | 🚧 Cortex.cpp is currently under development. Our documentation outlines the intended behavior of Cortex, which may not yet be fully implemented in the codebase.
|
16 | 8 | :::
|
17 | 9 |
|
| 10 | +# Guide to Adding a Third-Party Engine to Cortex |
| 11 | + |
| 12 | +## Introduction |
| 13 | + |
| 14 | +This guide outlines the steps to integrate a custom engine with Cortex. We hope this helps developers understand the integration process. |
| 15 | + |
| 16 | +## Implementation Steps |
| 17 | + |
| 18 | +### 1. Implement the Engine Interface |
| 19 | + |
| 20 | +First, create an engine that implements the `EngineI.h` interface. Here's the interface definition: |
| 21 | + |
| 22 | +```cpp |
| 23 | +class EngineI { |
| 24 | + public: |
| 25 | + struct RegisterLibraryOption { |
| 26 | + std::vector<std::filesystem::path> paths; |
| 27 | + }; |
| 28 | + |
| 29 | + struct EngineLoadOption { |
| 30 | + // engine |
| 31 | + std::filesystem::path engine_path; |
| 32 | + std::filesystem::path cuda_path; |
| 33 | + bool custom_engine_path; |
| 34 | + |
| 35 | + // logging |
| 36 | + std::filesystem::path log_path; |
| 37 | + int max_log_lines; |
| 38 | + trantor::Logger::LogLevel log_level; |
| 39 | + }; |
| 40 | + |
| 41 | + struct EngineUnloadOption { |
| 42 | + bool unload_dll; |
| 43 | + }; |
| 44 | + |
| 45 | + virtual ~EngineI() {} |
18 | 46 |
|
19 |
| -This document provides a step-by-step guide to adding a new engine to the Cortex codebase, similar to the `OpenAIEngineExtension`. |
| 47 | + virtual void RegisterLibraryPath(RegisterLibraryOption opts) = 0; |
20 | 48 |
|
| 49 | + virtual void Load(EngineLoadOption opts) = 0; |
21 | 50 |
|
22 |
| -## Integrate a New Remote Engine |
| 51 | + virtual void Unload(EngineUnloadOption opts) = 0; |
23 | 52 |
|
24 |
| -### Step 1: Create the New Engine Extension |
| 53 | + // Cortex.llamacpp interface methods |
| 54 | + virtual void HandleChatCompletion( |
| 55 | + std::shared_ptr<Json::Value> json_body, |
| 56 | + std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0; |
25 | 57 |
|
26 |
| -1. Navigate to the `cortex-js/src/extensions` directory. |
27 |
| -2. Create a new file named `<new-engine>.engine.ts` (replace `<new-engine>` with the name of your engine). |
28 |
| -3. Implement your new engine extension class using the following template: |
| 58 | + virtual void HandleEmbedding( |
| 59 | + std::shared_ptr<Json::Value> json_body, |
| 60 | + std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0; |
29 | 61 |
|
30 |
| -```typescript |
31 |
| -class <NewEngine>EngineExtension extends OAIEngineExtension { |
32 |
| - apiUrl = 'https://api.<new-engine>.com/v1/chat/completions'; |
33 |
| - name = '<new-engine>'; |
34 |
| - productName = '<New Engine> Inference Engine'; |
35 |
| - description = 'This extension enables <New Engine> chat completion API calls'; |
36 |
| - version = '0.0.1'; |
37 |
| - apiKey?: string; |
38 |
| -} |
| 62 | + virtual void LoadModel( |
| 63 | + std::shared_ptr<Json::Value> json_body, |
| 64 | + std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0; |
| 65 | + |
| 66 | + virtual void UnloadModel( |
| 67 | + std::shared_ptr<Json::Value> json_body, |
| 68 | + std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0; |
| 69 | + |
| 70 | + virtual void GetModelStatus( |
| 71 | + std::shared_ptr<Json::Value> json_body, |
| 72 | + std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0; |
| 73 | + |
| 74 | + // Compatibility and model management |
| 75 | + virtual bool IsSupported(const std::string& f) = 0; |
| 76 | + |
| 77 | + virtual void GetModels( |
| 78 | + std::shared_ptr<Json::Value> jsonBody, |
| 79 | + std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0; |
| 80 | + |
| 81 | + // Logging configuration |
| 82 | + virtual bool SetFileLogger(int max_log_lines, |
| 83 | + const std::string& log_path) = 0; |
| 84 | + virtual void SetLogLevel(trantor::Logger::LogLevel logLevel) = 0; |
| 85 | +}; |
39 | 86 | ```
|
40 | 87 |
|
41 |
| -:::info |
42 |
| -Be sure to replace all placeholders with the appropriate values for your engine. |
43 |
| -::: |
| 88 | +#### Lifecycle Management |
| 89 | +
|
| 90 | +##### RegisterLibraryPath |
| 91 | +
|
| 92 | +```cpp |
| 93 | +virtual void RegisterLibraryPath(RegisterLibraryOption opts) = 0; |
| 94 | +``` |
| 95 | +
|
| 96 | +This method is called during engine initialization to set up dynamic library search paths. For example, in Linux, we still have to use `LD_LIBRARY_PATH` to add CUDA dependencies to the search path. |
| 97 | +
|
| 98 | +**Parameters:** |
| 99 | +
|
| 100 | +- `opts.paths`: Vector of filesystem paths that the engine should register |
44 | 101 |
|
45 |
| -### Step 2: Register the New Engine |
| 102 | +**Implementation Requirements:** |
46 | 103 |
|
47 |
| -1. Open the `extensions.module.ts` located at `cortex-js/src/extensions/`. |
| 104 | +- Register provided paths for dynamic library loading |
| 105 | +- Handle invalid paths gracefully |
| 106 | +- Thread-safe implementation |
| 107 | +- No exceptions should escape the method |
48 | 108 |
|
49 |
| -2. Register your new engine in the provider array using the following code: |
| 109 | +##### Load |
50 | 110 |
|
51 |
| -```typescript |
52 |
| -[ |
53 |
| - new OpenAIEngineExtension(httpService, configUsecases, eventEmitter), |
54 |
| - //... other remote engines |
55 |
| - new <NewEngine>EngineExtension(httpService, configUsecases, eventEmitter), |
56 |
| -] |
| 111 | +```cpp |
| 112 | +virtual void Load(EngineLoadOption opts) = 0; |
57 | 113 | ```
|
58 | 114 |
|
59 |
| -## Explanation of Key Properties and Methods |
60 |
| -| **Value** | **Description** | |
61 |
| -|------------------------------------|--------------------------------------------------------------------------------------------------| |
62 |
| -| `apiUrl` | This is the URL endpoint for the new engine's API. It is used to make chat completion requests. | |
63 |
| -| `name` | This is a unique identifier for the engine. It is used internally to reference the engine. | |
64 |
| -| `productName` | This is a human-readable name for the engine. It is used for display purposes. | |
65 |
| -| `description` | This provides a brief description of what the engine does. It is used for documentation and display purposes. | |
66 |
| -| `version` | This indicates the version of the engine extension. It is used for version control and display purposes. | |
67 |
| -| `eventEmmitter.on('config.updated')` | This is an event listener that listens for configuration updates. When the configuration for the engine is updated, this listener updates the `apiKey` and the engine's status. | |
68 |
| -| `onLoad` | This method is called when the engine extension is loaded. It retrieves the engine's configuration (such as the `apiKey`) and sets the engine's status based on whether the `apiKey` is available. | |
| 115 | +Initializes the engine with the provided configuration options. |
69 | 116 |
|
70 |
| -## Advanced: Transforming Payloads and Responses |
| 117 | +**Parameters:** |
71 | 118 |
|
72 |
| -Some engines require custom transformations for the payload sent to the API and the response received from the API. This is achieved using the `transformPayload` and `transformResponse` methods. These methods allow you to modify the data structure to match the specific requirements of the engine. |
| 119 | +- `engine_path`: Base path for engine files |
| 120 | +- `cuda_path`: Path to CUDA installation |
| 121 | +- `custom_engine_path`: Flag for using custom engine location |
| 122 | +- `log_path`: Location for log files |
| 123 | +- `max_log_lines`: Maximum number of lines per log file |
| 124 | +- `log_level`: Logging verbosity level |
73 | 125 |
|
74 |
| -### `transformPayload` |
| 126 | +**Implementation Requirements:** |
| 127 | +
|
| 128 | +- Validate all paths before use |
| 129 | +- Initialize engine components |
| 130 | +- Set up logging configuration |
| 131 | +- Handle missing dependencies gracefully |
| 132 | +- Clean initialization state in case of failures |
| 133 | +
|
| 134 | +##### Unload |
| 135 | +
|
| 136 | +```cpp |
| 137 | +virtual void Unload(EngineUnloadOption opts) = 0; |
| 138 | +``` |
| 139 | +
|
| 140 | +Performs cleanup and shutdown of the engine. |
| 141 | +
|
| 142 | +**Parameters:** |
| 143 | +
|
| 144 | +- `unload_dll`: Boolean flag indicating whether to unload dynamic libraries |
| 145 | +
|
| 146 | +**Implementation Requirements:** |
| 147 | +
|
| 148 | +- Clean up all allocated resources |
| 149 | +- Close file handles and connections |
| 150 | +- Release memory |
| 151 | +- Ensure proper shutdown of running models |
| 152 | +- Handle cleanup in a thread-safe manner |
| 153 | +
|
| 154 | +### 2. Create a Dynamic Library |
| 155 | +
|
| 156 | +We recommend using the [dylib library](https://github.com/martin-olivier/dylib) to build your dynamic library. This library provides helpful tools for creating cross-platform dynamic libraries. |
| 157 | +
|
| 158 | +### 3. Package Dependencies |
| 159 | +
|
| 160 | +Please ensure all dependencies are included with your dynamic library. This allows us to create a single, self-contained package for distribution. |
| 161 | +
|
| 162 | +### 4. Publication and Integration |
| 163 | +
|
| 164 | +#### 4.1 Publishing Your Engine (Optional) |
| 165 | +
|
| 166 | +If you wish to make your engine publicly available, you can publish it through GitHub. For reference, examine the [cortex.llamacpp releases](https://github.com/janhq/cortex.llamacpp/releases) structure: |
| 167 | +
|
| 168 | +- Each release tag should represent your version |
| 169 | +- Include all variants within the same release |
| 170 | +- Cortex will automatically select the most suitable variant or allow users to specify their preferred variant |
| 171 | +
|
| 172 | +#### 4.2 Integration with Cortex |
| 173 | +
|
| 174 | +Once your engine is ready, we encourage you to: |
| 175 | +
|
| 176 | +1. Notify the Cortex team about your engine for potential inclusion in our default supported engines list |
| 177 | +2. Allow us to help test and validate your implementation |
| 178 | +
|
| 179 | +### 5. Local Testing Guide |
| 180 | +
|
| 181 | +To test your engine locally: |
| 182 | +
|
| 183 | +1. Create a directory structure following this hierarchy: |
| 184 | +
|
| 185 | +```bash |
| 186 | +engines/ |
| 187 | +└── cortex.llamacpp/ |
| 188 | + └── mac-arm64/ |
| 189 | + └── v0.1.40/ |
| 190 | + ├── libengine.dylib |
| 191 | + └── version.txt |
| 192 | +``` |
75 | 193 |
|
76 |
| -The `transformPayload` method is used to transform the data before sending it to the engine's API. This method takes the original payload and modifies it as needed. |
| 194 | +1. Configure your engine: |
77 | 195 |
|
78 |
| -**Example: Anthropic Engine** |
| 196 | + - Edit the `~/.cortexrc` file to register your engine name |
| 197 | + - Add your model with the appropriate engine field in `model.yaml` |
79 | 198 |
|
80 |
| -In the Anthropic Engine, the `transformPayload` method extracts the system message and other messages, and includes additional parameters like `model`, `stream`, and `max_tokens`. |
| 199 | +2. Testing: |
| 200 | + - Start the engine |
| 201 | + - Load your model |
| 202 | + - Verify functionality |
81 | 203 |
|
82 |
| -### `transformResponse` |
| 204 | +## Future Development |
83 | 205 |
|
84 |
| -The `transformResponse` method is used to transform the data received from the engine's API. This method processes the response and converts it into a format that the application can use. |
| 206 | +We're currently working on expanding support for additional release sources to make distribution more flexible. |
85 | 207 |
|
86 |
| -**Example: Anthropic Engine** |
| 208 | +## Contributing |
87 | 209 |
|
88 |
| -In the Anthropic Engine, the `transformResponse` method handles both stream and non-stream responses. It processes the response data and converts it into a standardized format. |
89 |
| - --> |
| 210 | +We welcome suggestions and contributions to improve this integration process. Please feel free to submit issues or pull requests through our repository. |
0 commit comments