Closed
Description
There has been a report of OOM for MySQL CDC. We should stress test the CDC connector to make sure it can handle millions of records. A user reported OOM while syncing a table with following schema :
CREATE TABLE `structured_brief` (
`pk` int(10) unsigned NOT NULL AUTO_INCREMENT,
`id` char(16) COLLATE utf8mb4_unicode_520_ci NOT NULL,
`user_id` int(10) unsigned NOT NULL,
`state` varchar(30) COLLATE utf8mb4_unicode_520_ci NOT NULL,
`schema_id` varchar(32) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
`data` text COLLATE utf8mb4_unicode_520_ci NOT NULL,
`created_at` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`launched_at` datetime DEFAULT NULL,
`api_app` varchar(64) COLLATE utf8mb4_unicode_520_ci DEFAULT NULL,
`external_order_item_id` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`pk`),
UNIQUE KEY `id` (`id`),
KEY `user_id` (`user_id`),
KEY `schema_id` (`schema_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2007265 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_520_ci
The table had 1M records. We should try to sync a similar table with similar number of records and observe the memory consumption of our docker instance and find bottlenecks.
Airbyte was running on a t3.xlarge instance (4vCPU's and 16GB).
There are few things that might be causing the OOM (its a guess, we need to validate it) :
- We maintain an in memory queue here in which debezium pushes the records from 1 thread and Airbyte reads from this queue from another thread. If debezium is pushing records at much faster pace compared to Airbyte consuming the records, the in memory data structure would grow
- The records from the database might be big in size (values with lots of VARCHAR/TEXT/BLOB columns) which would again result in the size of the queue getting bigger
- Debezium internally is doing some stuff which is hogging up the memory.