Skip to content

Make stats resilient to incomplete streams #192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 5, 2025
Merged

Conversation

jobarr-amzn
Copy link
Contributor

  • Also adds a --count option for simpler output

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

- Also adds a `--count` option for simpler output
Copy link
Contributor

@zslayton zslayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, you can choose whether to address them or just open a TODO.

match current_value.read().unwrap() {
Struct(s) => {
for field in s {
stack.push((field.unwrap().value(), depth + 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to return an anyhow::Result instead of .unwrap()ping all of the errors in this method.

VersionMarker(_) | EncodingDirective(_) => continue,
SymbolTable(_) => symtab_count += 1,
system_value @ Value(raw_value) => {
let size = system_value.raw_stream_item().unwrap().span().bytes().len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Ion 1.1, you can have system values that are not backed by a raw stream item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do I do about that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's an ephemeral/computed value, then it doesn't have a size. You can choose whether that means size=0 bytes or if you don't even count it. The unwrap() will cause a panic, though.

}
// SystemStreamItem is non_exhaustive
unsupported => panic!("Unsupported system stream item: {unsupported:?}"),
},
}
}
// Reduce the number of shared symbols.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took me a second—it's subtracting the number of system symbols in Ion 1.0. This will produce inaccurate results in Ion 1.1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't even look at that. reader.detected_encoding().version().system_symbol_table().len() would be more accurate, but that doesn't really tell me if the system symbols were visible/relevant in a 1.1 stream... right?

What would you advise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 1.1, the symbol table size is already what you want. The system symbols that are automatically pre-populated in the user symbol table are optional, so you can consider them the same as the user selecting them themselves.

While we're here, this:

let symbols_count = reader.symbol_table().symbols().iter().len() - 10;

could be:

let symbols_count = reader.symbol_table().len() - 10;

No need to count them one by one.

Comment on lines +113 to +116
unparseable_count += 1;
if matches!(e, IonError::Incomplete(..)) {
break;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this logic encounters a malformed (vs incomplete) item, I think it will go into an infinite loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After offline conversation, next_item if it errors will always error. next_item() => Err(_) means the stream is irretrievable by the SystemReader from this point on.

I'll fix this.

@jobarr-amzn jobarr-amzn merged commit 8332d8c into main Mar 5, 2025
3 of 4 checks passed
@jobarr-amzn jobarr-amzn deleted the incomplete-stats branch March 5, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants