Skip to content

tfw_logger: gRPC extension for machine learning #2421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
krizhanovsky opened this issue May 7, 2025 · 0 comments
Open

tfw_logger: gRPC extension for machine learning #2421

krizhanovsky opened this issue May 7, 2025 · 0 comments

Comments

@krizhanovsky
Copy link
Contributor

krizhanovsky commented May 7, 2025

Motivation

At the moment tfw_logger is only used to write access logs and security logs in #2399 , but we also need to feed the data to the Tempesta Escudo classification daemons. The daemons, using NN, are likely to be run on a separate cluster, or at least must have such ability. From the other side, simple classifications can be done on the local host. Thus, we need both the gRPC and local interfaces, like zero-copy #77. Use flatbuffers to minimize serialization overhead on the proxy nodes.

However, this issue adds plenty of HTTP headers to send and copying all of them could hurt Tempesta FW performance. Ideally, if tfw_logger could use the zero-copy mappings #77.

Scope

Tempesta Managed from Escudo already used gRPC/flatbuffers on the client (CLI) and server (manager) sides, so let's reuse it. To do so we need to move the gRPC code to tempesta/utils and fetch tempesta as a git submodule for Tempesta Escudo (@consuelo2210 FYI).

Need to add new logging facility grpc plus to current mmap:

{
   "log": "/var/log/tempesta_access.log",
    "access_log": {
        "host": "localhost",
        "table": "access_log",
        "mmap_log_buffer_size": 4096,
        "extra-headers": [ "sec-fetch-site" ],
    },
    "dos": {
        "host": "localhost",
        "table": "security_dos",
        "mmap_log_buffer_size": 4096,
        "extra-headers": [ "sec-fetch-site" ],
    },    
    "suspicious": {
        "host": "localhost",
        "table": "suspicious",
        "mmap_log_buffer_size": 4096,
        "extra-headers": [ "sec-fetch-site" ],
    },
    "grpc": {
        "host": "localhost:4433",
        "extra-headers": [ "sec-fetch-site", "sec-fetch-mode" ]
    }
}

We also need to be able to add following headers to the list of extra-headers to be logged in mmap and grpc modes (let's leave dmesg as is to not to overload already overloaded kernel logging):

  • Accept-Language
  • Accept-Encoding
  • Accept
  • Content-Language
  • Cookie
  • Upgrade-Insecure-Requests
  • Sec-Fetch (Sec-Fetch-Mode, Sec-Fetch-Site, Sec-Fetch-Dest)
  • Cache-Control
  • Connection
  • Keep-Alive

This list is expected to grow and expected to be specified in the configuration, so need to make the headers special. Also, since this is a pretty a volume of data, need to apply some simple compression. We can start with simple flags, e.g. for Connection just use: 0 - no header, 1 - keep-alive, 2 - close, 3 - anything other. The same flag technique can be applied for sec-fetch headers. Ideally, compression should be done on the tfw_logger side, but if we have the data already compressed (e.g. with Huffman) we probably can just pass it as is to the user space.

Testing

Just borrow the parts of the current ML logic and reuse them in the tests for gRPC/flatbuffers receving.

Documentation

Add this feature to https://tempesta-tech.com/knowledge-base/Handling-clients/ in case if any open source users also benefit from getting logs via gRPC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants