Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 58c071c

Browse files
vansangpfievsangjanaiohaiibuzzle
authored
chore: sync main to dev (#1978)
* feat: AMD hardware API (#1797) * feat: add amd gpu windows * chore: remove unused code * feat: get amd gpus * fix: clean * chore: cleanup * fix: set activate * fix: build windows * feat: linux * fix: add patches * fix: map cuda gpus * fix: build * chore: docs * fix: build * chore: clean up * fix: build * fix: build * chore: pack vulkan windows * chore: vulkan linux --------- Co-authored-by: vansangpfiev <[email protected]> * fix: add cpu usage (#1868) Co-authored-by: vansangpfiev <[email protected]> * fix: PATCH method for Thread and Messages management (#1923) Co-authored-by: vansangpfiev <[email protected]> * fix: ignore compute_cap if not present (#1866) * fix: ignore compute_cap if not present * fix: correct gpu info * fix: remove check for toolkit version --------- Co-authored-by: vansangpfiev <[email protected]> * fix: models.cc: symlinked model deletion shouldn't remove original file (#1918) Co-authored-by: vansangpfiev <[email protected]> * fix: correct gpu info list (#1944) * fix: correct gpu info list * chore: cleanup --------- Co-authored-by: vansangpfiev <[email protected]> * fix: gpu: filter out llvmpipe * fix: add vendor in gpu info (#1952) Co-authored-by: vansangpfiev <[email protected]> * fix: correct get server name method (#1953) Co-authored-by: vansangpfiev <[email protected]> * fix: map nvidia and vulkan uuid (#1954) Co-authored-by: vansangpfiev <[email protected]> * fix: permission issue for default drogon uploads folder (#1870) Co-authored-by: vansangpfiev <[email protected]> * chore: change timeout * fix: make get hardware info function thread-safe (#1956) Co-authored-by: vansangpfiev <[email protected]> * fix: cache data for gpu information (#1959) * fix: wrap vulkan gpu function * fix: init * fix: cpu usage * fix: build windows * fix: buld macos --------- Co-authored-by: vansangpfiev <[email protected]> * fix: handle path with space (#1963) * fix: unload engine before updating (#1970) Co-authored-by: sangjanai <[email protected]> * fix: auto-reload model for remote engine (#1971) Co-authored-by: sangjanai <[email protected]> * fix: use updated configuration for remote model when reload (#1972) Co-authored-by: sangjanai <[email protected]> * fix: correct engine interface order (#1974) Co-authored-by: sangjanai <[email protected]> * fix: improve error handling for remote engine (#1975) Co-authored-by: sangjanai <[email protected]> * fix: temporarily remove model setting recommendation (#1977) Co-authored-by: sangjanai <[email protected]> --------- Co-authored-by: vansangpfiev <[email protected]> Co-authored-by: OHaiiBuzzle <[email protected]>
1 parent bb6d60b commit 58c071c

22 files changed

+352
-220
lines changed

docs/docs/architecture/cortex-db.mdx

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,14 @@ import TabItem from "@theme/TabItem";
1515
This document outlines Cortex database architecture which is designed to store and manage models, engines,
1616
files and more.
1717

18-
## Tables Structure
19-
18+
## Table Structure
2019
### schema Table
21-
2220
The `schema` table is designed to hold schema version for cortex database. Below is the structure of the table:
2321

2422
| Column Name | Data Type | Description |
2523
|--------------------|-----------|---------------------------------------------------------|
26-
| version | INTEGER | A unique schema version for database. |
24+
| schema_version | INTEGER | A unique schema version for database. |
25+
2726

2827
### models Table
2928
The `models` table is designed to hold metadata about various AI models. Below is the structure of the table:
@@ -53,7 +52,6 @@ The `hardware` table is designed to hold metadata about hardware information. Be
5352
| activated | INTEGER | A boolean value (0 or 1) indicating whether the hardware is activated or not. |
5453
| priority | INTEGER | An integer value representing the priority associated with the hardware. |
5554

56-
5755
### engines Table
5856
The `engines` table is designed to hold metadata about the different engines available for useage with Cortex.
5957
Below is the structure of the table:

engine/CMakeLists.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,6 @@ if(CMAKE_BUILD_INJA_TEST)
7373
add_subdirectory(examples/inja)
7474
endif()
7575

76-
7776
find_package(jsoncpp CONFIG REQUIRED)
7877
find_package(Drogon CONFIG REQUIRED)
7978
find_package(yaml-cpp CONFIG REQUIRED)

engine/cli/commands/server_start_cmd.cc

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -66,16 +66,16 @@ bool ServerStartCmd::Exec(const std::string& host, int port,
6666
si.cb = sizeof(si);
6767
ZeroMemory(&pi, sizeof(pi));
6868
std::wstring params = L"--start-server";
69-
params += L" --config_file_path " +
70-
file_manager_utils::GetConfigurationPath().wstring();
71-
params += L" --data_folder_path " +
72-
file_manager_utils::GetCortexDataPath().wstring();
69+
params += L" --config_file_path \"" +
70+
file_manager_utils::GetConfigurationPath().wstring() + L"\"";
71+
params += L" --data_folder_path \"" +
72+
file_manager_utils::GetCortexDataPath().wstring() + L"\"";
7373
params += L" --loglevel " + cortex::wc::Utf8ToWstring(log_level_);
7474
std::wstring exe_w = cortex::wc::Utf8ToWstring(exe);
7575
std::wstring current_path_w =
7676
file_manager_utils::GetExecutableFolderContainerPath().wstring();
77-
std::wstring wcmds = current_path_w + L"/" + exe_w + L" " + params;
78-
CTL_DBG("wcmds: " << wcmds);
77+
std::wstring wcmds = current_path_w + L"\\" + exe_w + L" " + params;
78+
CTL_INF("wcmds: " << wcmds);
7979
std::vector<wchar_t> mutable_cmds(wcmds.begin(), wcmds.end());
8080
mutable_cmds.push_back(L'\0');
8181
// Create child process

engine/common/hardware_common.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,7 @@ struct GPU {
7979
int64_t total_vram;
8080
std::string uuid;
8181
bool is_activated = true;
82+
std::string vendor;
8283
};
8384

8485
inline Json::Value ToJson(const std::vector<GPU>& gpus) {
@@ -100,7 +101,10 @@ inline Json::Value ToJson(const std::vector<GPU>& gpus) {
100101
gpu["total_vram"] = gpus[i].total_vram;
101102
gpu["uuid"] = gpus[i].uuid;
102103
gpu["activated"] = gpus[i].is_activated;
103-
res.append(gpu);
104+
gpu["vendor"] = gpus[i].vendor;
105+
if (gpus[i].total_vram > 0) {
106+
res.append(gpu);
107+
}
104108
}
105109
return res;
106110
}

engine/controllers/engines.cc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -375,17 +375,21 @@ void Engines::UpdateEngine(
375375
metadata = (*exist_engine).metadata;
376376
}
377377

378+
(void)engine_service_->UnloadEngine(engine);
379+
378380
auto upd_res =
379381
engine_service_->UpsertEngine(engine, type, api_key, url, version,
380382
"all-platforms", status, metadata);
381383
if (upd_res.has_error()) {
382384
Json::Value res;
383385
res["message"] = upd_res.error();
386+
CTL_WRN("Error: " << upd_res.error());
384387
auto resp = cortex_utils::CreateCortexHttpJsonResponse(res);
385388
resp->setStatusCode(k400BadRequest);
386389
callback(resp);
387390
} else {
388391
Json::Value res;
392+
CTL_INF("Remote Engine update successfully!");
389393
res["message"] = "Remote Engine update successfully!";
390394
auto resp = cortex_utils::CreateCortexHttpJsonResponse(res);
391395
resp->setStatusCode(k200OK);
@@ -394,6 +398,7 @@ void Engines::UpdateEngine(
394398
} else {
395399
Json::Value res;
396400
res["message"] = "Request body is empty!";
401+
CTL_WRN("Error: Request body is empty!");
397402
auto resp = cortex_utils::CreateCortexHttpJsonResponse(res);
398403
resp->setStatusCode(k400BadRequest);
399404
callback(resp);

engine/controllers/models.cc

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -218,10 +218,11 @@ void Models::ListModel(
218218
obj["id"] = model_entry.model;
219219
obj["model"] = model_entry.model;
220220
obj["status"] = "downloaded";
221-
auto es = model_service_->GetEstimation(model_entry.model);
222-
if (es.has_value() && !!es.value()) {
223-
obj["recommendation"] = hardware::ToJson(*(es.value()));
224-
}
221+
// TODO(sang) Temporarily remove this estimation
222+
// auto es = model_service_->GetEstimation(model_entry.model);
223+
// if (es.has_value() && !!es.value()) {
224+
// obj["recommendation"] = hardware::ToJson(*(es.value()));
225+
// }
225226
data.append(std::move(obj));
226227
yaml_handler.Reset();
227228
} else if (model_config.engine == kPythonEngine) {

engine/cortex-common/EngineI.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,14 +59,14 @@ class EngineI {
5959
const std::string& log_path) = 0;
6060
virtual void SetLogLevel(trantor::Logger::LogLevel logLevel) = 0;
6161

62+
// Stop inflight chat completion in stream mode
63+
virtual void StopInferencing(const std::string& model_id) = 0;
64+
6265
virtual Json::Value GetRemoteModels() = 0;
6366
virtual void HandleRouteRequest(
6467
std::shared_ptr<Json::Value> json_body,
6568
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;
6669
virtual void HandleInference(
6770
std::shared_ptr<Json::Value> json_body,
6871
std::function<void(Json::Value&&, Json::Value&&)>&& callback) = 0;
69-
70-
// Stop inflight chat completion in stream mode
71-
virtual void StopInferencing(const std::string& model_id) = 0;
7272
};

engine/extensions/remote-engine/remote_engine.cc

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,13 @@ size_t StreamWriteCallback(char* ptr, size_t size, size_t nmemb,
2929
CTL_DBG(chunk);
3030
Json::Value check_error;
3131
Json::Reader reader;
32-
if (reader.parse(chunk, check_error)) {
32+
context->chunks += chunk;
33+
if (reader.parse(context->chunks, check_error) ||
34+
(reader.parse(chunk, check_error) &&
35+
chunk.find("error") != std::string::npos)) {
36+
CTL_WRN(context->chunks);
3337
CTL_WRN(chunk);
38+
CTL_INF("Request: " << context->last_request);
3439
Json::Value status;
3540
status["is_done"] = true;
3641
status["has_error"] = true;
@@ -143,7 +148,9 @@ CurlResponse RemoteEngine::MakeStreamingChatCompletionRequest(
143148
"",
144149
config.model,
145150
renderer_,
146-
stream_template};
151+
stream_template,
152+
true,
153+
body};
147154

148155
curl_easy_setopt(curl, CURLOPT_URL, full_url.c_str());
149156
curl_easy_setopt(curl, CURLOPT_HTTPHEADER, headers);

engine/extensions/remote-engine/remote_engine.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ struct StreamContext {
2525
extensions::TemplateRenderer& renderer;
2626
std::string stream_template;
2727
bool need_stop = true;
28+
std::string last_request;
29+
std::string chunks;
2830
};
2931
struct CurlResponse {
3032
std::string body;

engine/services/engine_service.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -870,10 +870,10 @@ cpp::result<void, std::string> EngineService::UnloadEngine(
870870
auto unload_opts = EngineI::EngineUnloadOption{};
871871
e->Unload(unload_opts);
872872
delete e;
873-
engines_.erase(ne);
874873
} else {
875874
delete std::get<RemoteEngineI*>(engines_[ne].engine);
876875
}
876+
engines_.erase(ne);
877877

878878
CTL_DBG("Engine unloaded: " + ne);
879879
return {};

0 commit comments

Comments
 (0)