-
Notifications
You must be signed in to change notification settings - Fork 175
RCORE-1386 Track client reset reason in table that detects client reset cycles #7649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build michael.wilkersonbarker_1128Details
💛 - Coveralls |
…ra-client-reset-fields
src/realm/sync/client_base.hpp
Outdated
@@ -62,11 +62,12 @@ class SyncSocketProvider; | |||
struct ClientReset { | |||
realm::ClientResyncMode mode; | |||
DBRef fresh_copy; | |||
bool recovery_is_allowed = true; | |||
bool recovery_is_allowed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just capture the original SessionErrorInfo that caused the client reset here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since SessionErrorInfo
can't be default created and there are only a few parts needed, I don't think we need to. I did remove the recovery_is_allowed
since it is no longer needed.
auto mode = config(&SyncConfig::client_resync_mode); | ||
if (mode == ClientResyncMode::Recover) { | ||
handle_fresh_realm_downloaded( | ||
nullptr, | ||
{ErrorCodes::RuntimeError, | ||
"A client reset is required but the server does not permit recovery for this client"}, | ||
server_requests_action); | ||
error_info); | ||
return; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just capture error_info in a member variable after this line rather than passing it as an argument to handle_fresh_realm_downloaded below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now being saved to a member variable in download_fresh_realm()
and no longer being passed to handle_fresh_realm_downloaded()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my earlier comment was still in "Pending"...
I tried that, but saving it in download_fresh_realm()
is too soon and caused issues if the SyncSession
was paused/resumed while a client reset is in progress - the error info was being consumed when the main Sync Session was manually resumed instead of when it was resumed after the fresh realm was downloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not accurate anymore. It's still being passed to handle_fresh_realm_downloaded
and that's where m_client_reset_error is updated.
TableRef table = get_or_create_client_reset_table(wt); | ||
REALM_ASSERT(table); | ||
// Even if the table is being updated to V2, an existing entry will still throw an exception | ||
size_t table_size = table->size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 583/584 can just be if(table->size() > 0)
. it doesn't look like we use the table_size variable anywhere else.
else if (type == 1) { | ||
pending.type = ClientResyncMode::Recover; | ||
// Version 2 columns | ||
if (pending.version >= 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use the sync metadata schema machinery here to create all these column/look up these colkeys for us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrated the pending reset function to use the sync metadata schema and made it a "store" object that is owned by SyncSession
.
…ra-client-reset-fields
…ra-client-reset-fields
…ra-client-reset-fields
…ra-client-reset-fields
…ra-client-reset-fields
…ra-client-reset-fields
…ra-client-reset-fields
src/realm/sync/client_base.hpp
Outdated
@@ -62,11 +62,11 @@ class SyncSocketProvider; | |||
struct ClientReset { | |||
realm::ClientResyncMode mode; | |||
DBRef fresh_copy; | |||
bool recovery_is_allowed = true; | |||
sync::ProtocolErrorInfo::Action action = sync::ProtocolErrorInfo::Action::ClientReset; | |||
std::optional<Status> error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't there always an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is - I had made it optional since Status
doesn't have a default constructor and ClientReset
is default constructed and then populated throughout the code.
I could add an appropriate constructor and remove the optional
if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we just store the reason? I think that's all we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be helpful to have the error code as well and it shouldn't be that expensive to store...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. it does feel a bit wrong to have the error optional though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the ClientReset
structure so error
is no longer an optional and updated the usages to use the initializers.
bool load_schema(const TransactionRef& rd_tr); | ||
void create_schema(const TransactionRef& rd_tr); | ||
|
||
std::optional<PendingReset> read_legacy_pending_reset(const TransactionRef& rd_tr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having this, isn't it easier to migrate the v1 data to v2 and use it as if the metadata was always at v2? We do something similar with the SubscriptionStore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose a more "lazy" approach to writing to the pending reset store and not write to the realm file when the session is started, but wait until the pending reset info is written or cleared.
In the original code, there is an instance where the original pending reset code was provided a frozen transaction, so I defaulted the reads to use frozen transactions as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It's just that we could do the same with less code I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I streamlined the legacy pending reset code a bit - it had extra logic for future schema updates, but took that out.
REALM_ASSERT(reset_store); | ||
DB& db_remote = *reset_config.fresh_copy; | ||
auto actual_mode = | ||
reset_precheck_guard(reset_store, reset_config.mode, reset_config.action, reset_config.error, logger); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should reset the guard part of the same write transaction (as before), so everything rolls back if it fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right - I see what you mean. I'll update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns there there is one commit transaction when track_reset()
is called, so even if it rolls back, the pending reset info is still stored. I updated the PendingResetStore
to be a set of static functions (again) with the schema metadata info used under the hood. This allows better control over the transactions and when they are committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, I just followed the code path and that's the case. I think there is still a difference because the write lock was not released before.
src/realm/sync/client.cpp
Outdated
m_sess->logger.info("Tracking pending client reset of type \"%1\" from %2", pending_reset->type, | ||
pending_reset->time); | ||
std::optional<PendingReset> pending_reset; | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it doesn't seem to need its own scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we return on line 1991, I don't think we need to wrap pending_reset in a std::optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I moved the PendingResetStore back to being a set of static functions, I ended up keeping the extra scope for the read transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not reusing the old code?
auto pending_reset = _impl::client_reset::has_pending_reset(m_db->start_frozen());
if (!pending_reset)
return;
…e fresh realm downloaded
…ra-client-reset-fields
…ra-client-reset-fields
// Use the original sync config, not the updated one from the migration store | ||
session_config.client_reset_config = | ||
make_client_reset_config(m_config, m_original_sync_config, std::move(m_client_reset_fresh_copy), | ||
allowed_to_recover, m_previous_schema_version.has_value()); | ||
std::move(*m_client_reset_error), m_previous_schema_version.has_value()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you make a copy here anyway, so you could just pass an lvalue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could also use auto client_reset_error = std::exchange(m_client_reset_error, std::nullopt);
so you don't need to reset it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rvalue is being passed into make_client_reset_config()
and then moved into the Config::ClientReset
structure returned by that function. Added the std::exchange()
call to remove the reset()
…ra-client-reset-fields
@@ -513,8 +512,9 @@ void SyncSession::download_fresh_realm(sync::ProtocolErrorInfo::Action server_re | |||
auto self = shared_from_this(); | |||
using SubscriptionState = sync::SubscriptionSet::State; | |||
fresh_sub.get_state_change_notification(SubscriptionState::Complete) | |||
.then([=](SubscriptionState) -> util::Future<sync::SubscriptionSet> { | |||
if (server_requests_action != sync::ProtocolErrorInfo::Action::MigrateToFLX) { | |||
.then([self, error_info, fresh_sync_session, fresh_sub_store, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've noticed that throughout this PR you're changing lambdas that implicitly capture by value to explicitly list their captures. is this fixing a bug or just making the code style more to your liking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically in download_fresh_realm()
, I was thinking listing the specific value used would reduce the overhead of capture by values so the entire function scope is not copied. I can revert back to the original capture list if the compiler does this optimization for us already..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a default capture of [=] { lambda body }
captures any referenced variables by copying them, but doesn't copy unreferenced variables. The old syntax and new syntax should generate the exact same instructions. See this godbolt if you want to double-check https://godbolt.org/z/q7KP41e8K.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That isn't how [=]
works. It doesn't capture the function scope; it captures by value the specific things referenced inside the lambda. This isn't a compiler optimization but rather is a pretty important part of the semantics of lambda capturing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK - thanks for the clarification!
} | ||
}); | ||
} | ||
fresh_sync_session->revive_if_needed(); | ||
} | ||
|
||
void SyncSession::handle_fresh_realm_downloaded(DBRef db, Status status, | ||
sync::ProtocolErrorInfo::Action server_requests_action, | ||
std::optional<sync::SessionErrorInfo> client_reset_error, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like we always either pass a Status::OK() and a client_reset_error or a not-okay status. Should we make this a StatusWith<sync::SessionErrorInfo>
?
src/realm/sync/client.cpp
Outdated
m_sess->logger.info("Tracking pending client reset of type \"%1\" from %2", pending_reset->type, | ||
pending_reset->time); | ||
std::optional<PendingReset> pending_reset; | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we return on line 1991, I don't think we need to wrap pending_reset in a std::optional.
|
||
class PendingResetStore { | ||
public: | ||
~PendingResetStore() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to explicitly define the destructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember why I defined that? I don't think it's needed.
test/object-store/sync/app.cpp
Outdated
} | ||
|
||
// Closing the realm should leave one ref for the SyncSession and one for the local dbref. | ||
REQUIRE_THAT( | ||
[&] { | ||
return dbref.use_count() < 4; | ||
logger->trace("DBRef PAUSED use count: %1", dbref.use_count()); | ||
return dbref.use_count() < 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did this work before if we weren't counting the MigrationStore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are actually 3 refs open, including the MigrationStore
- it was increased by 1 when I had originally created the PendingResetStore object...
Realm - DBRef ACTIVE use count: 5
Realm - DBRef PAUSING called use count: 5
Realm - DBRef PAUSED use count: 3
Realm - DBRef PAUSED use count: 3
Realm - DBRef TEARDOWN use count: 1
Realm - DBRef TEARDOWN use count: 1
I updated the count and the comments to indicate the DB references
…ra-client-reset-fields
src/realm/sync/client.cpp
Outdated
m_sess->logger.info("Tracking pending client reset of type \"%1\" from %2", pending_reset->type, | ||
pending_reset->time); | ||
std::optional<PendingReset> pending_reset; | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not reusing the old code?
auto pending_reset = _impl::client_reset::has_pending_reset(m_db->start_frozen());
if (!pending_reset)
return;
src/realm/sync/client.cpp
Outdated
if (status == ErrorCodes::OperationAborted) { | ||
return; | ||
} | ||
auto& logger = self->m_sess->logger; | ||
if (!status.is_ok()) { | ||
logger.error("Error while tracking client reset acknowledgement: %1", status); | ||
logger.error(util::LogCategory::reset, "Error while tracking client reset acknowledgement: %1", status); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
class PendingResetStore { | ||
public: | ||
// Store the pending reset tracking information - it is an error if the tracking info already | ||
// esists in the store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exists
static int64_t from_reset_action(PendingReset::Action action); | ||
static PendingReset::Action to_reset_action(int64_t action); | ||
static ClientResyncMode to_resync_mode(int64_t mode); | ||
static int64_t from_resync_mode(ClientResyncMode mode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these methods should be private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left them public so I can create tests for them without needing to use a class friend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. shouldn't they be tested through the methods invoking them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've spread out the values to try to validate them during the other tests, but the error cases are harder to test.
ColKey mode_col = table->get_column_key(s_v1_reset_mode_col_name); | ||
Obj reset_entry = *table->begin(); | ||
|
||
if (version_col && static_cast<int>(reset_entry.get<int64_t>(version_col)) == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to cast it. you can probably retrieve an int directly or compare the value with 1LL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, int
s aren't supported, but the comparison to 1LL
works great
"int realm::Obj::get<int>(realm::ColKey) const", referenced from:
realm::sync::PendingResetStore::read_legacy_pending_reset(std::__1::shared_ptr<realm::Transaction> const&) in librealm-sync-dbg.a[12](pending_reset_store.cpp.o)
ld: symbol(s) not found for architecture arm64
src/realm/sync/subscriptions.cpp
Outdated
else { | ||
tr->promote_to_write(); | ||
// Create the metadata schema | ||
create_sync_metadata_schema(tr, &internal_tables); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why swapping the order? Is it important? also, why is a new SyncMetadataSchemaVersions needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had originally swapped them to ensure the schema was written before the schema version, but, since they are in the same commit, it really doesn't matter. I also made some updates around this area so the initial creation of the schema versions table is in its own commit.
// Writing is disabled | ||
return false; // Either table is not initialized or version does not exist | ||
} | ||
else { // writable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for else
bool operator==(const sync::PendingReset& lhs, const sync::PendingReset& rhs); | ||
bool operator==(const sync::PendingReset& lhs, const PendingReset::Action& action); | ||
|
||
class PendingResetStore { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit odd that all methods are static in this class, and you create temporary instances used in some of these methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class instances are just for holding the table/column key variables after the schema metadata info is loaded. I had gone back and forth with this and think the static functions provides the caller control over the transactions and when the changes are committed (e.g. remove old and add new pending reset info in a single transaction, like was being done before). Although it did require some updates to the SchemaVersion classes and how they handle transactions.
Plus, this class is only used in a very small window and it seemed excessive to keep an object (and dbref) around during every SessionWrapper lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. I wasn't suggesting using a store object, but maybe the same old static functions if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I could pull the static functions outside the class and retain the old namespace for them... Not sure if it's necessary, since this way everything is contained in a single unit/class and it incorporates the internal object for holding the schema info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, I don't think its worth refactoring this again just to use the old function names.
@@ -2,6 +2,7 @@ | |||
|
|||
### Enhancements | |||
* <New feature description> (PR [#????](https://github.com/realm/realm-core/pull/????)) | |||
* Report the originating error that caused a client reset to occur. ([#6154](https://github.com/realm/realm-core/issues/6154)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we actually surfacing the error? could not find where it's done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do print the error with the rest of the PendingReset structure info:
https://github.com/realm/realm-core/blob/mwb/add-extra-client-reset-fields/src/realm/sync/noinst/pending_reset_store.cpp#L33-L49
Which will be provided via debug out here:
https://github.com/realm/realm-core/blob/mwb/add-extra-client-reset-fields/src/realm/sync/client.cpp#L1991
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example:
Realm.Sync.Client.Reset - Connection[1] Session[5]: Tracking pending client reset of type: 'Recover' at: 2024-05-31 00:38:03.146563000 for error: SyncClientResetRequired: Bad client file identifier (IDENT)
. . .
Realm.Sync.Client.Reset - Connection[1] Session[5]: Server has acknowledged pending client reset of type: 'Recover' at: 2024-05-31 00:38:03.146563000 for error: SyncClientResetRequired: Bad client file identifier (IDENT)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I meant is surfacing it through a SyncError in handle_error so the users know what caused the initial client reset.
…ra-client-reset-fields
What, How & Why?
This PR with the client reset error and action storage was broken out from PR #7542
These changes add the originating error Status and server requests action values from the
SessionErrorInfo
into the pending client reset tracking object that is used to provide client reset cycle detection and identify when a client reset is in progress so it can be closed out once the realm has been sync'ed with the server.Fixes #6154
☑️ ToDos
[ ] C-API, if public C++ API changed[ ]bindgen/spec.yml
, if public C++ API changed