Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

postgresql deadlock - INSERT INTO user_ips #14199

Closed
hyossing opened this issue Oct 17, 2022 · 12 comments
Closed

postgresql deadlock - INSERT INTO user_ips #14199

hyossing opened this issue Oct 17, 2022 · 12 comments
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@hyossing
Copy link

hyossing commented Oct 17, 2022

Description

Dear Synapse,

Deadlock is happen so often in Posgresql because

synapse workers tries to update lots of client ips on background process.

Also, sometimes it takes over 5 secs to process insert user_ips.

Is there any way to turn off inserting client ips? or any configuration?

Can I block(comment out) some store such as ClientIpBackgroundUpdateStore?

Is it independent from other synapse modules? So, it does not side-effect from that?

Thanks you.

Best Regards
Hyosung

Steps to reproduce

Reproducing

Reproducing is not easy. It is only shown on our synapse homeserver.

  • GET /sync is called somewhat often.

Synapse Deployment

  • Master and 3 workers to handle /sync from clients.

Synapse Usage

  • About 18 hz for /sync at peak time for each worker

Screen Shot 2022-10-17 at 10 17 24 AM

Homeserver

another homeserver

Synapse Version

1.35.0

Installation Method

Docker (matrixdotorg/synapse)

Platform

Operating System

  • vmware, ubuntu, 28cpu, 60gb

Relevant log output

### Synapse Worker3 Log
2022-10-14 14:12:05	
{"log":" 0/5\n"}
2022-10-14 14:12:05	
{"log":"CONTEXT:  while inserting index tuple (21022,25) in relation \"user_ips\"\n"}
2022-10-14 14:12:05	
{"log":"HINT:  See server log for query details.\n"}
2022-10-14 14:12:05	
{"log":"Process 42 waits for ShareLock on transaction 716397986; blocked by process 96.\n"}
2022-10-14 14:12:05	
{"log":"DETAIL:  Process 96 waits for ShareLock on transaction 716398006; blocked by process 42.\n"}
2022-10-14 14:12:05	
{"log":"2022-10-14 13:12:05,483 - synapse.storage.txn - 534 - WARNING - update_client_ips-123492- [TXN OPERROR] {_update_client_ips_batch-96c00a} deadlock detected\n"}

Anything else that would be useful to know?

Posgresql Log

2022-10-14 13:12:05.561 UTC [96] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@wuc_199_b63b8a6b-79a5-452d-83f4-9e00cc1607be:default', '<ACCESS_TOKEN_1_REDACTED>', '51.219.48.16', 'okhttp/3.12.1', '94978d8bf8d35545', 1665753120691) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2022-10-14 13:12:05.561 UTC [96] ERROR:  could not serialize access due to concurrent update
2022-10-14 13:12:05.481 UTC [96] STATEMENT:  INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@wuc_415_904f65fe-bc50-4aa2-97ef-f0d1cf942c62:default', '<ACCESS_TOKEN_2_REDACTED>', '51.241.170.92', 'iPECS%20ONE/2 CFNetwork/1335.0.3 Darwin/21.6.0', '1C17A6BA-6C8B-4FD9-BD6F-A47E93C27145', 1665753121063) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
2022-10-14 13:12:05.481 UTC [96] CONTEXT:  while inserting index tuple (21022,25) in relation "user_ips"
2022-10-14 13:12:05.481 UTC [96] HINT:  See server log for query details.
	Process 42: INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@wuc_199_b63b8a6b-79a5-452d-83f4-9e00cc1607be:default', '<ACCESS_TOKEN_1_REDACTED>', '51.219.48.16', 'okhttp/3.12.1', '94978d8bf8d35545', 1665753120457) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
	Process 96: INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@wuc_415_904f65fe-bc50-4aa2-97ef-f0d1cf942c62:default', '<ACCESS_TOKEN_2_REDACTED>', '51.241.170.92', 'iPECS%20ONE/2 CFNetwork/1335.0.3 Darwin/21.6.0', '1C17A6BA-6C8B-4FD9-BD6F-A47E93C27145', 1665753121063) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
	Process 42 waits for ShareLock on transaction 716397986; blocked by process 96.
2022-10-14 13:12:05.481 UTC [96] DETAIL:  Process 96 waits for ShareLock on transaction 716398006; blocked by process 42.
2022-10-14 13:12:05.481 UTC [96] ERROR:  deadlock detected
@DMRobertson
Copy link
Contributor

Synapse Version

1.35.0

This is over a year old; please upgrade to the latest stable release and see if that resolves your issue.

Posgresql Log

This includes access tokens and user ids. I have redacted the access tokens from your logs, but you should assume these devices are compromised and log them out. If necessary The Admin API can be used to logout all devices of a particular user.

@DMRobertson DMRobertson added A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. O-Uncommon Most users are unlikely to come across this or unexpected workflow X-Needs-Info This issue is blocked awaiting information from the reporter labels Oct 17, 2022
@hyossing
Copy link
Author

Hello,

It is not easy to upgrade Synapse version on production because we have a QA(Lifecycle) process.

I will try to reproduce on my test environment with latest synapse version.

By the way, could you answer for the questions above?

Is there any way to turn off inserting client ips? or any configuration?
Can I block(comment out) some store such as ClientIpBackgroundUpdateStore?
Is it independent from other synapse modules? So, it does not side-effect from that?

Thanks you.
Hyosung.

@DMRobertson
Copy link
Contributor

How are your workers configured? Have you configured run_background_tasks_on setting in the workers' config?

Note that only one worker is supposed to be configured write to the client IP tables---presumably to avoid precisely this kind of deadlock.

if self._update_on_this_worker:
# This is the designated worker that can write to the client IP
# tables.

Additionally:

It is not easy to upgrade Synapse version on production because we have a QA(Lifecycle) process.

Be advised: Synapse 1.35 has known security issues: see https://github.com/matrix-org/synapse/security/advisories.

Is there any way to turn off inserting client ips? or any configuration?

Not that I can see.

Can I block(comment out) some store such as ClientIpBackgroundUpdateStore? Is it independent from other synapse modules? So, it does not side-effect from that?

I wouldn't advise this: the data stores are tightly coupled together and this is liable to cause breakages.

@hyossing
Copy link
Author

hyossing commented Oct 19, 2022

Hello,

There is no config for run_background_tasks_on for each Synapse workers for our environment.

Anyway,

Synapse 1.35 looks like it does not care the run_background_tasks_on config to update client ips.
Screen Shot 2022-10-20 at 4 08 35 PM

Link

So, our Synapse workers are remains logs.

{"log":"2022-10-14 13:12:05,483 - synapse.storage.txn - 534 - WARNING - update_client_ips-123492- [TXN OPERROR] {_update_client_ips_batch-96c00a} deadlock detected\n"}

Maybe, Is it right?

I will upgrade synapse 1.35 to latest.

Best Regards
Hyosung

@DMRobertson
Copy link
Contributor

Synapse 1.35 looks like it does not care the run_background_tasks_on config to update client ips.

True: this was done in #12251 (1.57).

Odd then that you see this deadlock. I would sanity check that there aren't two workers which are both configured to be the master worker.

@DMRobertson
Copy link
Contributor

Process 42: INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@wuc_199_b63b8a6b-79a5-452d-83f4-9e00cc1607be:default', '<ACCESS_TOKEN_1_REDACTED>', '51.219.48.16', 'okhttp/3.12.1', '94978d8bf8d35545', 1665753120457) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen
	Process 96: INSERT INTO user_ips (user_id, access_token, ip, user_agent, device_id, last_seen) VALUES ('@wuc_415_904f65fe-bc50-4aa2-97ef-f0d1cf942c62:default', '<ACCESS_TOKEN_2_REDACTED>', '51.241.170.92', 'iPECS%20ONE/2 CFNetwork/1335.0.3 Darwin/21.6.0', '1C17A6BA-6C8B-4FD9-BD6F-A47E93C27145', 1665753121063) ON CONFLICT (user_id, access_token, ip) DO UPDATE SET user_agent=EXCLUDED.user_agent, device_id=EXCLUDED.device_id, last_seen=EXCLUDED.last_seen

Can you see where these requests come from by looking at Synapse logs? Look for the IP address and user agent, say in the 10 minutes prior to 2022-10-14 14:12:05? In fact, could you attach those logs please? Make sure to check for and redact sensitive information.

@hyossing
Copy link
Author

Hello,

That is from postgresql logs. It does not related to synapse's security.

Thanks you.

@DMRobertson
Copy link
Contributor

That is from postgresql logs.

Yes. I'm asking for Synapse logs corresponding to those postgres logs.

It does not related to synapse's security.

Postgres logs absolutely can be related to Synapse security if those logs contain secret values read by Synapse. As noted in my first response, the postgres logs you provided included Synapse access tokens which grant complete control over their users' accounts. I manually redacted those tokens.

@hyossing
Copy link
Author

Hello,

The transaction logger of the Synapse makes the logs.

Logger

transaction_logger = logging.getLogger("synapse.storage.txn")

logs

2022-10-14 14:12:05	
{"log":"2022-10-14 13:12:05,483 - synapse.storage.txn - 534 - WARNING - update_client_ips-123492- [TXN OPERROR] {_update_client_ips_batch-96c00a} deadlock detected\n"}

Thanks you.

@reivilibre
Copy link
Contributor

as an implementation note: It could be worth ordering the upserted rows so that deadlocks are not possible.

(I wondered if dropping down to READ COMMITTED isolation level might help but thinking about it I don't think it would.)

@reivilibre
Copy link
Contributor

Can I block(comment out) some store such as ClientIpBackgroundUpdateStore?

You could comment out the queries in _update_client_ips_batch so that nothing happens and IP addresses are not stored.

Whether this issue or not is still a problem would need someone to try and reproduce it on a modern version of Synapse. As DMRobertson says, v1.35 is not supported anymore and moreover has known security vulnerabilities.

@H-Shay
Copy link
Contributor

H-Shay commented Nov 29, 2022

I am going to close this for now, if you have upgraded to a modern version of Synapse and still see this issue, feel free to come back and comment and I will re-open.

@H-Shay H-Shay closed this as completed Nov 29, 2022
@H-Shay H-Shay removed the X-Needs-Info This issue is blocked awaiting information from the reporter label Nov 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

4 participants