Skip to content

Use generic operator #583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
May 26, 2022
Merged

Conversation

kaiyingshan
Copy link
Collaborator

@kaiyingshan kaiyingshan commented May 16, 2022

fixes #586 among other things

@kaiyingshan kaiyingshan requested a review from nirandaperera May 16, 2022 02:58
@kaiyingshan kaiyingshan marked this pull request as draft May 16, 2022 03:21
@nirandaperera
Copy link
Collaborator

@kaiyingshan this failure is in GCylon. Could you please check that as well?

Comment on lines 1477 to 1491
std::vector<int64_t> num_row_v = {num_row}, rows_per_partition;

std::shared_ptr<cylon::Column> num_row_col;
RETURN_CYLON_STATUS_IF_FAILED(Column::FromVector(num_row_v, num_row_col));

std::vector<std::shared_ptr<cylon::Column>> output;

RETURN_CYLON_STATUS_IF_FAILED(
a->GetContext()->GetCommunicator()->Allgather(num_row_col, &output));

for (auto col : output) {
auto temp = std::static_pointer_cast<arrow::Int64Scalar>(
col->data()->GetScalar(0).ValueOrDie());
rows_per_partition.push_back(temp->value);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should've used the Scalar all-gather here.

Suggested change
std::vector<int64_t> num_row_v = {num_row}, rows_per_partition;
std::shared_ptr<cylon::Column> num_row_col;
RETURN_CYLON_STATUS_IF_FAILED(Column::FromVector(num_row_v, num_row_col));
std::vector<std::shared_ptr<cylon::Column>> output;
RETURN_CYLON_STATUS_IF_FAILED(
a->GetContext()->GetCommunicator()->Allgather(num_row_col, &output));
for (auto col : output) {
auto temp = std::static_pointer_cast<arrow::Int64Scalar>(
col->data()->GetScalar(0).ValueOrDie());
rows_per_partition.push_back(temp->value);
}
const auto& ctx = a->GetContext();
auto num_row_v = Scalar::Make(arrow::MakeScalar(num_row));
std::shared_ptr<Column> num_row_col;
RETURN_CYLON_STATUS_IF_FAILED(ctx->GetCommunicator()->Allgather(num_row_v, &num_row_col));
int64_t* vals = std::static_pointer_cast<arrow::Int64Array>(num_row_col->data())->raw_values();
// TODO: it would have been nice, if you could use int64_t* vals, and int64_t num_row_col->length() instead of a vector
// creating a vector here, copies data again
std::vector<int64_t> rows_per_partition(vals, vals + num_row_col->length());

Comment on lines 1523 to 1524
auto sub_result_scalar =
std::make_shared<Scalar>(std::move(arrow::MakeScalar(int8_t(subResult))));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you are passing a shared_ptr to a constructor (Rvalue), moving is unnecessary. you would only have to move Lvalues.

Suggested change
auto sub_result_scalar =
std::make_shared<Scalar>(std::move(arrow::MakeScalar(int8_t(subResult))));
auto sub_result_scalar = std::make_shared<Scalar>(arrow::MakeScalar(int8_t(subResult)));

Comment on lines 1552 to 1567
std::vector<int64_t> num_row_v = {num_row}, sizes;

std::shared_ptr<cylon::Column> num_row_col;
RETURN_CYLON_STATUS_IF_FAILED(Column::FromVector(num_row_v, num_row_col));

std::vector<std::shared_ptr<cylon::Column>> sizes_cols;

RETURN_CYLON_STATUS_IF_FAILED(
table->GetContext()->GetCommunicator()->Allgather(num_row_col,
&sizes_cols));

for (auto col : sizes_cols) {
auto temp = std::static_pointer_cast<arrow::Int64Scalar>(
col->data()->GetScalar(0).ValueOrDie());
sizes.push_back(temp->value);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before!


auto result = DivideRowsEvenly(sizes);
auto size =
std::make_shared<Scalar>(std::move(arrow::MakeScalar(table->Rows())));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::make_shared<Scalar>(std::move(arrow::MakeScalar(table->Rows())));
std::make_shared<Scalar>(arrow::MakeScalar(table->Rows()));

Comment on lines 1553 to 1566
std::shared_ptr<cylon::Column> sizes_cols;
RETURN_CYLON_STATUS_IF_FAILED(Column::FromVector(sizes, sizes_cols));

auto num_row_scalar = std::make_shared<Scalar>(arrow::MakeScalar(num_row));

RETURN_CYLON_STATUS_IF_FAILED(
table->GetContext()->GetCommunicator()->Allgather(num_row_scalar,
&sizes_cols));

for (int64_t i = 0; i < sizes_cols->length(); i++) {
auto temp = std::static_pointer_cast<arrow::Int64Scalar>(
sizes_cols->data()->GetScalar(i).ValueOrDie());
sizes.push_back(temp->value);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -65,141 +65,7 @@ endfunction(cylon_run_test)
#Add tests as follows ...
# param 1 -- name of the test, param 2 -- number of processes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMPORTANT: Please re-enable these tests!!!

@nirandaperera
Copy link
Collaborator

@kaiyingshan is this ready? Can you run a code reformatter on the changed files?

@kaiyingshan kaiyingshan marked this pull request as ready for review May 25, 2022 22:33
Copy link
Collaborator

@nirandaperera nirandaperera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small change request

@@ -101,7 +101,7 @@ TEST_CASE("Partition testing", "[join]") {
}

SECTION("dist sort test") {
status = cylon::DistributedSort(table, 0, output, true, {(uint32_t) WORLD_SZ, (uint64_t) table->Rows()});
status = cylon::DistributedSort(table, 0, output, true, {(uint32_t) WORLD_SZ, (uint64_t) table->Rows(), SortOptions::REGULAR_SAMPLE});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel like this is more than 100 chars! 🙂

@nirandaperera nirandaperera merged commit e4dd38b into cylondata:main May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_io.py test is not running
2 participants