Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[coro_rdma][coro_ucx][feat] add rdmapp ucxpp libraries, examples and tests #886

Open
wants to merge 1 commit into
base: support_rdma
Choose a base branch
from

Conversation

howardlau1999
Copy link

Why

Preliminary support of RDMA transport

What is changing

  • Add rdmapp for coroutine adaptation for ibverbs
  • Add a minimal example integrating RDMA communication with asio

Example

See src/coro_rdma/examples/example.cpp

@CLAassistant
Copy link

CLAassistant commented Jan 20, 2025

CLA assistant check
All committers have signed the CLA.

@howardlau1999 howardlau1999 force-pushed the rdma_dev branch 2 times, most recently from b50b98b to 97f49a4 Compare January 21, 2025 13:42
@howardlau1999 howardlau1999 force-pushed the rdma_dev branch 2 times, most recently from 7e7e4d4 to ef8e7c9 Compare February 16, 2025 13:42
@helintongh
Copy link
Contributor

不使用ucx的原因能阐述一下吗?谢谢

@howardlau1999
Copy link
Author

不使用ucx的原因能阐述一下吗?谢谢

UCX 我也打算支持,因为 RDMA 的接口和 Socket 接口有很大不同,目前是打算先调通 RDMA 相关功能的流程,后面可以对 RDMA 相关的抽象增加 UCX 支持

Copy link
Author

@howardlau1999 howardlau1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不使用ucx的原因能阐述一下吗?谢谢

已添加 UCX 相关实现,不过似乎无法在 softroce 环境下测试,在物理真实 roce 硬件下测试可以跑通

@qicosmos
Copy link
Collaborator

qicosmos commented Mar 18, 2025

demo pseudocode code to show rpc with rdma:

struct rdma_service_t {
  pingpong_context ctx;
  pingpong_dest get_pingpong_test(pingpong_dest from_peer) {
    ELOG_INFO << "remote gid: " << from_peer.gid
              << ", remote qp_num: " << from_peer.qpn;

    pingpong_dest dest{};
    dest.lid = ctx.lid;
    dest.qpn = ctx.qp->qp_num;
    dest.psn = g_psn++;
    memcpy(dest.gid, ctx.str_gid, 33);

    modify_qp_to_rts(ctx.qp, from_peer);
    return dest;
  }
};

struct pingpong_dest {
  int lid;
  int qpn;
  int psn;
  char gid[33];
};

int main() {
  coro_rpc_server server(/*thread=*/std::thread::hardware_concurrency(),
                         /*port=*/8801);

  auto ctx = create_rdma_ctx();
  rdma_service_t rdma_srv{ctx};
  server.register_handler<&rdma_service_t::get_pingpong_test>(&rdma_srv);
  server.start();
}
//client
Lazy<void> show_rpc_call() {
  auto ctx = create_rdma_ctx();

  coro_rpc_client client;

  [[maybe_unused]] auto ec = co_await client.connect("127.0.0.1", "8801");
  assert(!ec);

  pingpong_dest dest{};
  dest.lid = ctx.lid;
  dest.qpn = ctx.qp->qp_num;
  dest.psn = ++g_psn;
  memcpy(dest.gid, ctx.str_gid, 33);

  auto ret1 = co_await client.call<&rdma_service_t::get_pingpong_test>(dest);
  auto peer = ret1.value();
  modify_qp_to_rts(ctx.qp, peer);
}

@howardlau1999 howardlau1999 changed the title [coro_rdma][feat] add rdmapp and coro_rdma example [coro_{rdma,ucx}][feat] add {rdma,ucx}pp, examples and tests Mar 24, 2025
@howardlau1999 howardlau1999 force-pushed the rdma_dev branch 3 times, most recently from a552c5c to a4c82ce Compare March 24, 2025 14:21
@howardlau1999 howardlau1999 changed the title [coro_{rdma,ucx}][feat] add {rdma,ucx}pp, examples and tests [coro_rdma][coro_ucx][feat] add rdmapp ucxpp libraries, examples and tests Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants