-
Notifications
You must be signed in to change notification settings - Fork 3k
Regular expression export/import #9976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Split off build_compile_error() from build_compile_result().
"make opt debug" will build one target at a time but each targets' sub-makefile may build in parallel. This to avoid corrupted files when the same file is generated from two Makefile invocations.
@josevalim What do you think about this? |
CT Test Results 4 files 228 suites 1h 53m 38s ⏱️ Results for commit 0597625. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts
// Erlang/OTP Github Action Bot |
I believe this is fantastic and simplifies many of the issues we had to tackle in Elixir. Thank you. It would be fantastic if this could be used from Erlang too. Perhaps a pass in the compiler will rewrite re:compile into re:import? Also, do you see this making to 28.1 or would it be 29 only? |
The plan is to get this export/import functionality into 28.1. And then potentially do the loader optimization later maybe already in 28.2. |
@sverker making it part of 28.1 would help Elixir codebases migrate to latest OTP, so thank you. I have one additional question: do you think it is reasonable for |
I have one additional thought: what if the export is part of the existing tagged tuple? For example, you can add a new field to |
Problem
Before OTP 28.0 it was possible to abuse the compiled format of regular expressions as returned by
re:compile
as if it was a serialized format to be imported into other Erlang node instances. This abuse happened to work as long as the underlying hardware architecture and PCRE version was not too incompatible. But it was unsafe as any unpleasant behavior could be the result of passing an incompatible compiled regular expression tore:run
.In OTP 28.0 the compiled format has changed to not expose the internals of PCRE but instead return a safe (magic) reference to the internal regex structures. A compiled regex is now safe but can only be used in the node instance that compiled it.
Solution
This PR introduces a supported safe way to export compiled regular expressions. The exported format is self-contained and can be stored off-node or sent to another nodes. If the importing node is compatible (architecture and PCRE version), then the compiled regex can be used directly with minimal overhead. If not compatible, then the regular expression will be recompiled from the original string and options which are included as a fallback in the exported format.
Usage
then in a potentially other node do
Exported format
The exported format is opaque but look currently like this:
{re_exported_pattern, HeaderBin, OrigBin, OrigOpts, EncodedBin}
EncodedBin
- binary containing the compiled regex as encoded bypcre2_serialize_encode()
HeaderBin
- binary with some meta information including a CRC checksum overEncodedBin
OrigBin
- original regular expression as a binary stringOrigOpts
- options passed tore:compile/2
.Future optimization
For users that earlier generated Erlang code with compiled regular expressions as literals would now instead compile with option
export
and generatere:import(Literal)
instead of just the literal. If done like that, the beam loader could be optimized to detect such calls tore:import
with literals as arguments, evaluate the calls in load-time and replace them with just the returned compiled regular expression as a literal term.