Skip to content

perf fix Time #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

perf fix Time #142

wants to merge 2 commits into from

Conversation

mcakircali
Copy link
Contributor

try to avoid std::regex

@mcakircali mcakircali requested review from Ozaq and danovaro October 11, 2024 12:23
@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 94.44444% with 3 lines in your changes missing coverage. Please review.

Project coverage is 64.07%. Comparing base (418d5b7) to head (65ccb23).
Report is 4 commits behind head on develop.

Files with missing lines Patch % Lines
src/eckit/types/Time.cc 94.44% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #142      +/-   ##
===========================================
+ Coverage    64.06%   64.07%   +0.01%     
===========================================
  Files         1079     1079              
  Lines        55679    55688       +9     
  Branches      4118     4124       +6     
===========================================
+ Hits         35668    35680      +12     
+ Misses       20011    20008       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@Ozaq Ozaq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are at this class, I would also encourage you to add some class level doc and describe the accepted format variants and to be expected exceptions

Comment on lines +148 to +150
throw BadTime("Unkown format for time: " + s);
} else {
throw SeriousBug("Unhandled time format!");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a need for two different exceptions? I understand the intent of those exception types as follows:

  • BadTime is best described as a usage error, as in it was called as Time("ABCD")
  • SeriousBug represents an implementation error such as Time("20:11:24") not getting parsed properly.

Is that second throw giving any additional benefits?


// DIGITS: "^-?[0-9]+$"
// FLOAT: "^-?[0-9]*\\.[0-9]+$"
enum class TimeFormat { UNKOWN, OTHER, DIGITS, DECIMAL };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UNKNOWN is treated as erroneous input, it go stronger in the name and call it INVALID

ss = sec;
} else if (format == TimeFormat::OTHER) {
std::smatch m;
if (std::regex_match(s, m, hhmmss_)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still using a regex.

ss += 60 * (mm + 60 * (hh + 24 * dd));
if (s[0] == '-') {
ss = -ss;
} else if (std::regex_match(s, m, ddhhmmss_)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still using a regex.

Comment on lines +40 to +49
for (auto i = start; i < time.length(); i++) {
if (time[i] == '.') {
if (hasDecimal || i == time.length() - 1) { return TimeFormat::UNKOWN; }
hasDecimal = true;
} else if (isdigit(time[i]) == 0) {
return TimeFormat::OTHER;
} else {
hasDigit = true;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done with a single pass over the input string and then return the typeid of the found variant (pretty similar to what you are doing already) plus up to 5 tokens, 1 token for the sign and 4 for positive integers (day, hour, minute, second).

Subsequent code can then validate on this result, e.g. was the extended flag passed.

I am thinking along the lines of:

struct tokenized_time {
    FormatType type;
    std::string_view sign;
    std::string_view integers[4];
};

@simondsmart simondsmart added the approved-for-ci Approved for CI run label Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved-for-ci Approved for CI run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants