Skip to content

Commit 2c0f490

Browse files
committed
[#23785] YSQL: Fix incremental cache refresh log flooding
Summary: I found two logs that appear in high frequency when incremental cache refresh is enabled. (1) The new table pg_yb_invalidation_messages is added during the finalization phase of a upgrade process. Before that I can see the following logs: ``` W0403 16:09:36.040861 142435 master_heartbeat_service.cc:365] Could not get YSQL invalidation messages for heartbeat response: Not found (yb/tablet/tablet_metadata.cc:819): Table <unknown_table_name> (00000001000030008001000000001f90) not found in Raft group 00000000000000000000000000000000 W0403 16:09:36.832208 142470 master_heartbeat_service.cc:365] Could not get YSQL invalidation messages for heartbeat response: Not found (yb/tablet/tablet_metadata.cc:819): Table <unknown_table_name> (00000001000030008001000000001f90) not found in Raft group 00000000000000000000000000000000 W0403 16:09:36.932433 142444 master_heartbeat_service.cc:365] Could not get YSQL invalidation messages for heartbeat response: Not found (yb/tablet/tablet_metadata.cc:819): Table <unknown_table_name> (00000001000030008001000000001f90) not found in Raft group 00000000000000000000000000000000 W0403 16:09:37.071781 142453 master_heartbeat_service.cc:365] Could not get YSQL invalidation messages for heartbeat response: Not found (yb/tablet/tablet_metadata.cc:819): Table <unknown_table_name> (00000001000030008001000000001f90) not found in Raft group 00000000000000000000000000000000 W0403 16:09:37.862165 142476 master_heartbeat_service.cc:365] Could not get YSQL invalidation messages for heartbeat response: Not found (yb/tablet/tablet_metadata.cc:819): Table <unknown_table_name> (00000001000030008001000000001f90) not found in Raft group 00000000000000000000000000000000 W0403 16:09:37.961740 142484 master_heartbeat_service.cc:365] Could not get YSQL invalidation messages for heartbeat response: Not found (yb/tablet/tablet_metadata.cc:819): Table <unknown_table_name> (00000001000030008001000000001f90) not found in Raft group 00000000000000000000000000000000 ``` This happens for a 3-node dev machine local cluster, the same log is printed 3 times a second, once per heartbeat response. That's too much. (2) Run unit test ./yb_build.sh --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.WaitForSharedCatalogVersionToCatchup Look at the test output for logs like the following: ``` [ts-1] 2025-04-03 16:11:58.032 UTC [144680] LOG: waiting for shared catalog version to reach 199 [ts-1] 2025-04-03 16:11:58.132 UTC [144680] LOG: waiting for shared catalog version to reach 199 [ts-1] 2025-04-03 16:11:58.232 UTC [144680] LOG: waiting for shared catalog version to reach 199 [ts-1] 2025-04-03 16:11:58.332 UTC [144680] LOG: waiting for shared catalog version to reach 199 [ts-1] 2025-04-03 16:11:58.433 UTC [144680] LOG: waiting for shared catalog version to reach 199 [ts-1] 2025-04-03 16:11:58.533 UTC [144680] LOG: waiting for shared catalog version to reach 199 [ts-1] 2025-04-03 16:11:58.633 UTC [144680] LOG: waiting for shared catalog version to reach 199 [ts-1] 2025-04-03 16:11:58.733 UTC [144680] LOG: waiting for shared catalog version to reach 199 ``` The log is printed every 0.1 second, that's too much. I made a fix to address these two. For the first log, I changed to use YB_LOG_EVERY_N_SECS(WARNING, 60) so that the NotFound log is printed once every minute. For the second log, I changed to always print the first log, but then print once every 0.1 * 20 = 2 seconds. In addition, I print the end result as well and with a warning if the target version is not reached. Test Plan: 1. build using commit 80c4195 that did not have the table pg_yb_invalidation_messages. Using the build result to (1) create a 3 node cluster: ./bin/yb-ctl create --rf 3 (2) stop the cluster ./bin/yb-ctl stop (3) use the build with this diff to restart the cluster ./bin/yb-ctl start (4) go to leader master node and look at yb-master.INFO ``` W0403 16:24:52.071898 147882 master_heartbeat_service.cc:370] Could not get YSQL invalidation messages for heartbeat response: Not found (yb/tablet/tablet_metadata.cc:819): Table <unknown_table_name> (00000001000030008001000000001f90) not found in Raft group 00000000000000000000000000000000 W0403 16:25:52.798542 147908 master_heartbeat_service.cc:370] Could not get YSQL invalidation messages for heartbeat response: Not found (yb/tablet/tablet_metadata.cc:819): Table <unknown_table_name> (00000001000030008001000000001f90) not found in Raft group 00000000000000000000000000000000 W0403 16:26:53.525211 147895 master_heartbeat_service.cc:370] Could not get YSQL invalidation messages for heartbeat response: Not found (yb/tablet/tablet_metadata.cc:819): Table <unknown_table_name> (00000001000030008001000000001f90) not found in Raft group 00000000000000000000000000000000 W0403 16:27:54.263994 147837 master_heartbeat_service.cc:370] Could not get YSQL invalidation messages for heartbeat response: Not found (yb/tablet/tablet_metadata.cc:819): Table <unknown_table_name> (00000001000030008001000000001f90) not found in Raft group 00000000000000000000000000000000 ``` We can see that the log is printed once every minute. 2. Run unit test ./yb_build.sh --cxx-test pg_catalog_version-test --gtest_filter PgCatalogVersionTest.WaitForSharedCatalogVersionToCatchup Look at the output ``` [ts-1] 2025-04-03 16:30:43.392 UTC [150357] LOG: waiting for shared catalog version to reach 199 [ts-1] 2025-04-03 16:30:44.394 UTC [150357] LOG: shared catalog version has reached 199 ``` We see the first log is printed, and the end result is also printed. Reviewers: kfranz, sanketh, mihnea Reviewed By: kfranz Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D42997
1 parent 9c51684 commit 2c0f490

File tree

2 files changed

+28
-7
lines changed

2 files changed

+28
-7
lines changed

src/postgres/src/backend/utils/misc/pg_yb_utils.c

+19-5
Original file line numberDiff line numberDiff line change
@@ -1118,15 +1118,29 @@ YbWaitForSharedCatalogVersionToCatchup(uint64_t version)
11181118
*/
11191119
if (shared_catalog_version == YB_CATCACHE_VERSION_UNINITIALIZED)
11201120
return;
1121-
ereport(LOG,
1122-
(errmsg("waiting for shared catalog version to reach %" PRIu64,
1123-
version),
1124-
errhidestmt(true),
1125-
errhidecontext(true)));
1121+
/* Avoid flooding the log file, but always print for the first time. */
1122+
if (count % 20 == 1)
1123+
ereport(LOG,
1124+
(errmsg("waiting for shared catalog version to reach %" PRIu64,
1125+
version),
1126+
errhidestmt(true),
1127+
errhidecontext(true)));
11261128
/* wait 0.1 sec */
11271129
pg_usleep(100000L);
11281130
shared_catalog_version = YbGetSharedCatalogVersion();
11291131
}
1132+
if (shared_catalog_version >= version)
1133+
ereport(LOG,
1134+
(errmsg("shared catalog version has reached %" PRIu64,
1135+
shared_catalog_version),
1136+
errhidestmt(true),
1137+
errhidecontext(true)));
1138+
else
1139+
ereport(WARNING,
1140+
(errmsg("shared catalog version %" PRIu64 " has not reached %" PRIu64,
1141+
shared_catalog_version, version),
1142+
errhidestmt(true),
1143+
errhidecontext(true)));
11301144
}
11311145

11321146
/*---------------------------------------------------------------------------*/

src/yb/master/master_heartbeat_service.cc

+9-2
Original file line numberDiff line numberDiff line change
@@ -362,8 +362,15 @@ void MasterHeartbeatServiceImpl::PopulatePgCatalogVersionInfo(
362362
}
363363
}
364364
} else {
365-
LOG(WARNING) << "Could not get YSQL invalidation messages for heartbeat response: "
366-
<< ResultToStatus(messages);
365+
const auto& s = messages.status();
366+
auto msg = Format("Could not get YSQL invalidation messages for heartbeat response: $0", s);
367+
if (s.IsNotFound()) {
368+
// During upgrade, the table pg_yb_invalidation_messages is created in the finalization
369+
// phase. We do not want to flood the log before that.
370+
YB_LOG_EVERY_N_SECS(WARNING, 60) << msg;
371+
} else {
372+
LOG(WARNING) << msg;
373+
}
367374
}
368375
}
369376

0 commit comments

Comments
 (0)