Skip to content

Commit 3697635

Browse files
authored
Improve ergonomics of icu4x-datagen (#6476)
1 parent 5eed624 commit 3697635

File tree

3 files changed

+48
-8
lines changed

3 files changed

+48
-8
lines changed

CONTRIBUTING.md

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,43 @@ To build all code paths, improve build times in VSCode, and prevent locking the
4444

4545
Note: the path in `ICU4X_DATA_DIR` is relative to `provider/data/*/src/lib.rs` and it causes VSCode to build ICU4X with only the `und` locale. This reduces build times but also makes some tests fail; to run them normally, run `cargo test --all-features` on the command line.
4646

47+
### Building and Rebuilding Repo Data
48+
49+
In the ICU4X repository, there are a few types of locale data:
50+
51+
1. Test data: used for internal ICU4X development purposes only
52+
- Downloaded data sources: `provider/source/tests/data`
53+
- Regen: `cargo make download-repo-sources`
54+
- Generated JSON data: `provider/source/data/debug`
55+
- Regen: `cargo make testdata`
56+
2. Hard-coded source data: source of truth is this repo; used by icu4x-datagen
57+
- Segmenter TOML files: `provider/source/data/segmenter`
58+
3. Runtime default compiled data: the `icu_*_data` crates
59+
- Crate roots: `provider/data`
60+
- Regen: `cargo make bakeddata`
61+
- Regen a specific component: `cargo make bakeddata <component>`
62+
63+
During development, it is often convenient to generate only a single data marker as JSON. To do this (fully offline), you can run, for example:
64+
65+
```bash
66+
$ cargo run -p icu4x-datagen \
67+
--no-default-features --features provider,fs_exporter \
68+
-- --format fs --pretty -o _debug/data \
69+
--cldr-root provider/source/tests/data/cldr \
70+
--icuexport-root provider/source/tests/data/icuexport \
71+
--segmenter-lstm-root provider/source/tests/data/lstm \
72+
--tzdb-root provider/source/tests/data/tzdb \
73+
--deduplication none \
74+
--locales ru th \
75+
--markers DatetimePatternsDateGregorianV1 DatetimePatternsDateBuddhistV1
76+
```
77+
78+
Tips:
79+
80+
- Set your desired locales and data markers on the bottom two lines.
81+
- To overwrite the directly, add: `-W`
82+
- To print verbose logs, add: `-v`
83+
4784
## Contributing a Pull Request
4885

4986
The first step is to fork the repository to your namespace and create a branch off of the `main` branch to work with.
@@ -79,10 +116,12 @@ There are various files that auto-generated across the ICU4X repository. Here a
79116
need to run in order to recreate them. These files may be run in more comprehensive tests such as those included in `cargo make ci-job-test` or `cargo make ci-all`.
80117

81118
- `cargo make testdata` - regenerates all test data in the `provider/source/debug` directory.
82-
- `cargo make bakeddata` - regenerates baked data in the `provider/data` directory.
83-
- `cargo make bakeddata foo` can be used to generate data in `provider/data/foo` only.
119+
- Tip: See [Building and Rebuilding Repo Data](#building-and-rebuilding-repo-data) for additional shortcuts.
120+
- `cargo make bakeddata` - regenerates baked data in the `provider/data` directory.
121+
- `cargo make bakeddata foo` can be used to generate data in `provider/data/foo` only.
84122
- `cargo make generate-readmes` - generates README files according to Rust docs. Output files must be committed in git for check to pass.
85123
- `cargo make diplomat-gen` - recreates the Diplomat generated files in the `ffi/capi` directory.
124+
- `cargo make codegen` - recreates certain Askama generated files in the `ffi/capi/src` directory based on templates in `tools/make/codegen/templates`.
86125

87126
### Testing
88127

provider/icu4x-datagen/Cargo.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ version.workspace = true
1919
[dependencies]
2020
icu_provider = { workspace = true }
2121
icu = { workspace = true, features = ["datagen"] }
22-
icu_provider_export = { workspace = true, features = ["rayon"] }
22+
icu_provider_export = { workspace = true }
2323
icu_provider_source = { workspace = true, optional = true }
2424
icu_provider_registry = { workspace = true }
2525

@@ -31,7 +31,7 @@ log = { workspace = true }
3131
simple_logger = { workspace = true }
3232

3333
[features]
34-
default = ["use_wasm", "networking", "fs_exporter", "blob_exporter", "baked_exporter", "provider"]
34+
default = ["use_wasm", "networking", "fs_exporter", "blob_exporter", "baked_exporter", "provider", "rayon"]
3535
provider = ["dep:icu_provider_source"]
3636
baked_exporter = ["icu_provider_export/baked_exporter"]
3737
blob_exporter = ["icu_provider_export/blob_exporter"]
@@ -46,6 +46,7 @@ use_wasm = ["icu_provider_source?/use_wasm"]
4646
use_icu4c = ["icu_provider_source?/use_icu4c"]
4747
networking = ["icu_provider_source?/networking"]
4848
experimental = ["icu_provider_source?/experimental", "icu/experimental"]
49+
rayon = ["icu_provider_export/rayon"]
4950

5051
[package.metadata.cargo-all-features]
5152
# We don't need working CPT builders for check

provider/icu4x-datagen/src/main.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -380,7 +380,7 @@ fn main() -> eyre::Result<()> {
380380
#[cfg(not(feature = "networking"))]
381381
(None, _) => {
382382
eyre::bail!(
383-
"Downloading data from tags requires the `networking` Cargo feature"
383+
"Please set --cldr-root or enable the `networking` Cargo feature"
384384
)
385385
}
386386
};
@@ -396,7 +396,7 @@ fn main() -> eyre::Result<()> {
396396
#[cfg(not(feature = "networking"))]
397397
(None, _) => {
398398
eyre::bail!(
399-
"Downloading data from tags requires the `networking` Cargo feature"
399+
"Please set --icuexport-root or enable the `networking` Cargo feature"
400400
)
401401
}
402402
};
@@ -412,7 +412,7 @@ fn main() -> eyre::Result<()> {
412412
#[cfg(not(feature = "networking"))]
413413
(None, _) => {
414414
eyre::bail!(
415-
"Downloading data from tags requires the `networking` Cargo feature"
415+
"Please set --segmenter-lstm-root or enable the `networking` Cargo feature"
416416
)
417417
}
418418
};
@@ -428,7 +428,7 @@ fn main() -> eyre::Result<()> {
428428
#[cfg(not(feature = "networking"))]
429429
(None, _) => {
430430
eyre::bail!(
431-
"Downloading data from tags requires the `networking` Cargo feature"
431+
"Please set --tzdb-root or enable the `networking` Cargo feature"
432432
)
433433
}
434434
};

0 commit comments

Comments
 (0)