Skip to content

Regexp count #51501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open

Regexp count #51501

wants to merge 16 commits into from

Conversation

dwdwqfwe
Copy link

@dwdwqfwe dwdwqfwe commented Jun 4, 2025

What problem does this PR solve?

Issue Number: close #51350

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Author

@dwdwqfwe dwdwqfwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

dwdwqfwe

This comment was marked as resolved.

dwdwqfwe

This comment was marked as outdated.

dwdwqfwe

This comment was marked as resolved.

Copy link
Author

@dwdwqfwe dwdwqfwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run buildall

Copy link
Author

@dwdwqfwe dwdwqfwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run buildall


DataTypePtr get_return_type_impl(const DataTypes& arguments) const override {
auto int64_type = std::make_shared<DataTypeInt64>();
return make_nullable(std::static_pointer_cast<const IDataType>(int64_type));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could return make_nullable(std::make_shared());

if (scope == FunctionContext::THREAD_LOCAL) {
auto ptr = context->get_function_state(scope);
if (ptr) {
delete reinterpret_cast<re2::RE2*>(ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seeme you set is std::shared_ptrre2::RE2, it's need the delete?

auto result_null_map = ColumnUInt8::create(input_rows_count, 0);
auto result_data_column = ColumnInt64::create();
auto& result_data = result_data_column->get_data();
result_data.resize(input_rows_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto result_data_column = ColumnInt64::create(input_rows_count);


argument_columns[1] = col_const[1] ? static_cast<const ColumnConst&>(
*block.get_by_position(arguments[1]).column)
.convert_to_full_column()
Copy link
Contributor

@zhangstar333 zhangstar333 Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the const column you seem not used and argument_columns[0] and [1] are all need convert_to_full column,
maybe could diretly:
for loop {
argument_columns[i] = block.get_by_position(arguments[0]).column).convert_to_full_column()
}

const auto* str = check_and_get_column<ColumnString>(argument_columns[0].get());

for (size_t i = 0; i < input_rows_count; ++i) {
if (null_map[i]) {
Copy link
Contributor

@zhangstar333 zhangstar333 Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the check is used for ?? you null_map seems is result_null_map->get_data()
and the null_map is init value is 0,

const size_t index_now) {
re2::RE2* re = reinterpret_cast<re2::RE2*>(
context->get_function_state(FunctionContext::THREAD_LOCAL));
std::unique_ptr<re2::RE2> scoped_re;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used scoped_re?
and maybe could add check re is not nullptr


int64_t count = 0;
size_t pos = 0;
re2::StringPiece str_sp(str_data.data, str_data.size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used str_sp?

pos++;
} else {
count++;
pos += match.data() - current.data() + match.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could add some docment about
pos += match.data() - current.data() + match.size();

@zhangstar333
Copy link
Contributor

run buildall

Copy link
Author

@dwdwqfwe dwdwqfwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please wait some time ,i will fix these problem soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Impl the function regexp_count like trino
3 participants