Skip to content

Out of memory #73

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
reynir opened this issue Oct 18, 2024 · 6 comments
Closed

Out of memory #73

reynir opened this issue Oct 18, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@reynir
Copy link
Contributor

reynir commented Oct 18, 2024

I triggered an out of memory exception while trying to deploy a ~17 MB unikernel with no arguments. The mollymawk unikernel itself runs with "only" 128 MB.

console 2024-10-18T09:54:08-00:00: 2024-10-18T09:54:08-00:00: [ERROR] [application] error exception Out of memory while processing request ((method "POST") (target "/unikernel/create") (version "HTTP/1.1") (headers 
console 2024-10-18T09:54:08-00:00:                                                  (("Priority" "u=0")
console 2024-10-18T09:54:08-00:00:                                                  ("Sec-Fetch-Site" "same-origin")
console 2024-10-18T09:54:08-00:00:                                                  ("Sec-Fetch-Mode" "cors")
console 2024-10-18T09:54:08-00:00:                                                  ("Sec-Fetch-Dest" "empty")
console 2024-10-18T09:54:08-00:00:                                                  ("Cookie" "molly_session=REDACTED")
console 2024-10-18T09:54:08-00:00:                                                  ("Connection" "keep-alive")
console 2024-10-18T09:54:08-00:00:                                                  ("Sec-GPC" "1")("DNT" "1")
console 2024-10-18T09:54:08-00:00:                                                  ("Origin" "https://mollymawk.robur.coop")
console 2024-10-18T09:54:08-00:00:                                                  ("Content-Length" "17617146")
console 2024-10-18T09:54:08-00:00:                                                  ("Content-Type" "multipart/form-data; boundary=---------------------------120320325341882077743984541787")
console 2024-10-18T09:54:08-00:00:                                                  ("Referer" "https://mollymawk.robur.coop/unikernel/deploy")
console 2024-10-18T09:54:08-00:00:                                                  ("Accept-Encoding" "gzip, deflate, br, zstd")
console 2024-10-18T09:54:08-00:00:                                                  ("Accept-Language" "en-US,en;q=0.5")
console 2024-10-18T09:54:08-00:00:                                                  ("Accept" "*/*")
console 2024-10-18T09:54:08-00:00:                                                  ("User-Agent" "Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0")
console 2024-10-18T09:54:08-00:00:                                                  ("Host" "mollymawk.robur.coop"))))
@reynir
Copy link
Contributor Author

reynir commented Oct 18, 2024

Trying again with a stripped unikernel image (9.1 MB) it OOM'd in a different place:

console 2024-10-18T10:13:06-00:00: Fatal error: exception Out of memory
console 2024-10-18T10:13:06-00:00: Raised by primitive operation at Stdlib__Buffer.resize in file "buffer.ml", line 87, characters 19-40
console 2024-10-18T10:13:06-00:00: Called from Stdlib__Buffer.add_string in file "buffer.ml", line 178, characters 34-46
console 2024-10-18T10:13:06-00:00: Called from Multipart_form.RAW.parser.(fun).choose in file "duniverse/multipart_form/lib/multipart_form.ml", line 139, characters 10-51
console 2024-10-18T10:13:06-00:00: Called from Angstrom__Parser.Monad.(>>=).(fun).succ' in file "duniverse/angstrom/lib/parser.ml", line 58, characters 38-43
console 2024-10-18T10:13:06-00:00: Called from Angstrom__Parser.to_exported_state.(fun) in file "duniverse/angstrom/lib/parser.ml", line 32, characters 29-57
console 2024-10-18T10:13:06-00:00: Called from Multipart_form.parse.(fun) in file "duniverse/multipart_form/lib/multipart_form.ml", line 540, characters 14-78
console 2024-10-18T10:13:06-00:00: Called from Multipart_form.of_stream_tbl.go in file "duniverse/multipart_form/lib/multipart_form.ml", line 574, characters 10-20
console 2024-10-18T10:13:06-00:00: Called from Multipart_form.of_stream_to_list in file "duniverse/multipart_form/lib/multipart_form.ml", line 581, characters 8-41
console 2024-10-18T10:13:06-00:00: Called from Multipart_form.of_string_to_list in file "duniverse/multipart_form/lib/multipart_form.ml" (inlined), line 605, characters 2-55
console 2024-10-18T10:13:06-00:00: Called from Dune__exe__Unikernel.Main.unikernel_create.(fun) in file "unikernel.ml", line 739, characters 14-54
console 2024-10-18T10:13:06-00:00: Called from Lwt.Sequential_composition.bind.create_result_promise_and_callback_if_deferred.callback in file "duniverse/lwt/src/core/lwt.ml", line 1844, characters 16-19
console 2024-10-18T10:13:06-00:00: Solo5: solo5_exit(2) called

@hannesm
Copy link
Contributor

hannesm commented Oct 18, 2024

Thanks for your report.

My first observation: something catches out of memory. I guess for a web server that by discarding one (large) request can free up a good chunk of memory, this is fine. Although I'm not entirely convinced.

From a second observation: when we receive the unikernel image, we have it in the request in memory, then do the multipart decoding (keeping it in memory a second time)? Can we revise/dig more into the details to not need it multiple times in memory -- eventually even streaming the binary from the request to albatross (I guess this will be pretty hard, since we'd need a streaming x509 certificate signing request / signing API). Instead of taking the request in full, then multipart decoding in full, then creating the certificate signing request & certificate -- why not stream the request and stream the decoding (or do an in-place decoding without the need of more memory)?

@PizieDust PizieDust added the bug Something isn't working label Oct 19, 2024
@hannesm
Copy link
Contributor

hannesm commented Oct 21, 2024

Further discussion lead to:

  • multipart_lwt supports streaming \o/
  • now, the json content of the form, we need it in a streaming manner (esp. the unikernel image content) -- not entirely clear how to achieve this (note: we also would like to support a REST API, thus doing some chunking on the client side in javascript is not a good solution)
  • once we have the json in a streaming manner, we'll need the out path (communication to albatross) to be streaming-ready as well
  • the communication currently uses TLS + X509 client certificate with the unikernel image
    --> we need TLS in a way that we can stream a X509 certificate
    --> we also need the X509.Signing_request.create/sign in a streaming way

Best achieved likely with seq. Requires changes to asn1-combinators. But there'll be a benefit in other packages (such as albatross). But this will take some time to push through.

@hannesm
Copy link
Contributor

hannesm commented Oct 21, 2024

As alternative for the out path, we could modify the interface albatross has (and e.g. allow the unikernel binary on the TLS channel, instead of in the client certificate).

@hannesm
Copy link
Contributor

hannesm commented Apr 11, 2025

/cc @kit-ty-kate here

So, from what I can tell there's many things that go wrong here ;)

One is that we should have a streaming interface.

The other is ocaml-tls going a bit nuts when you say "please send this file". How it works is that send_application_data (what a TLS write does) calls out to send_records, which calls encrypt_records -- which is the funny function that splits the given data into chunks of at most "1 lsl 14" size (from TLS RFC https://datatracker.ietf.org/doc/html/rfc8446#section-5.1), in order to encrypt each individually:

let encrypt_records encryptor version records =
  let rec split = function
    | [] -> []
    | (t1, a) :: xs when String.length a >= 1 lsl 14 ->
      let fst, snd = split_str a (1 lsl 14) in
      (t1, fst) :: split ((t1, snd) :: xs)
    | x::xs -> x :: split xs

  and crypt st = function
    | []            -> (st, [])
    | (ty, buf)::rs ->
        let (st, ty, enc) = encrypt version st ty buf in
        let (st, encs) = crypt st rs in
        (st, (ty, enc) :: encs)
  in
  crypt encryptor (split records)

let split_str ?(start = 0) str off =
  String.sub str start off,
  String.sub str (start + off) (String.length str - off - start)

Now, the TLS 1.3 AEAD encryption works by adding the content type at the end - another allocate -- and another one by encrypt_aead to store the output data:

let encrypt (version : tls_version) (st : crypto_state) ty buf =
  match st with
  | None -> (st, ty, buf)
  | Some ctx ->
     match version with
     | `TLS_1_3 ->
        (match ctx.cipher_st with
         | AEAD c ->
            let buf =
              let t = String.make 1 (Char.unsafe_chr (Packet.content_type_to_int ty)) in
              buf ^ t
            in
            let nonce = Crypto.aead_nonce c.nonce ctx.sequence in
            let adata = Crypto.adata_1_3 (String.length buf + Crypto.tag_len c.cipher) in
            let buf = Crypto.encrypt_aead ~cipher:c.cipher ~adata ~key:c.cipher_secret ~nonce buf in
            (Some { ctx with sequence = Int64.succ ctx.sequence }, Packet.APPLICATION_DATA, buf)
         | _ -> assert false)

So, we first split it into 2^14 chunks, allocating quite a lot of memory, just to allocate these once more for the additional content type byte, and then allocate these once more for the encrypted data.

Consider a chunk of 131073 bytes (2 * 2^14 + 1), we split it into 3 chunks - allocating a 65536, one 65537; one 65536 and one 1 string, then allocating 65537, 65537, 2; and once more 65537, 65537, 2 bytes --> blowing up by around a factor of 5. Plus likely due to MirageOS network stack still using Cstruct some further buffers...

TL;DR: we need to fix allocations and re-evaluate split_str usage, as well as ^ (+ String.concat) usage. But I won't do this today anymore.

@hannesm
Copy link
Contributor

hannesm commented Apr 14, 2025

since we improved ocaml-tls (release 2.0.1), I'll close this issue -- the currently running mollymawk allows to deploy e.g. unipi :)

@hannesm hannesm closed this as completed Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

3 participants