Skip to content

Hanging forever in replicated table while multiple update/insert on ramchunk flush #3048

Closed
@donhardman

Description

@donhardman

We have an issue, and after investigation, I tried to make MRE but did not succeed. However, I found a weird bug that might be related to it. The bug makes the daemon hang forever on the second server for replicated tables during SELECT and all other queries.

Instructions

  1. We should run bin/mre and wait until the last insert hangs indefinitely.
  2. Open another console and try to connect to server with port 29306 and execute the following commands to see that the table is unresponsive:
$ mysql -h0 -P29306
mysql> create table hello;
mysql> show tables from system;
+-------------+------+
| Table       | Type |
+-------------+------+
| system.test | rt   |
+-------------+------+
mysql> show status like 'cluster_c_status';
+------------------+---------+
| Counter          | Value   |
+------------------+---------+
| cluster_c_status | primary |
+------------------+---------+
mysql> select * from system.test;
[ NEVER GET RESPONSE, hangs forever ]

Scripts:

#!/usr/bin/env bash

DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
if [ -z "$1" ]; then
	echo "Usage: $0 <num-manticore-nodes>"
	exit 1
fi
nodes=$1
if ((nodes < 2)); then
	echo "Number of nodes must be greater than 1"
	exit 1
fi
if ((nodes > 4)); then
	echo "Number of nodes must be less than or equal to 555"
	exit 1
fi

for n in $(seq 1 $nodes); do
	cat << EOF > $DIR/manticore-${n}.conf
searchd {
	listen = 127.0.0.1:${n}9312
	listen = 127.0.0.1:${n}9306:mysql
	listen = 127.0.0.1:${n}9308:http
	log = /var/log/manticore/searchd-$n.log
	query_log = /var/log/manticore/query-$n.log
	pid_file = /var/run/manticore/searchd-$n.pid
	data_dir = /var/lib/manticore/$n
}
EOF
done

# Function to stop processes
stop_processes() {
	echo "Stopping searchd processes..."

	for n in $(seq 1 $nodes); do
		searchd --config "$DIR/manticore-${n}.conf" --stop
	done
	exit 0
}

# Set up trap to catch Cmd+C (SIGINT)
trap stop_processes SIGINT

# Start both searchd processes in the background and redirect output to console
for n in $(seq 1 $nodes); do
	test -d /var/lib/manticore/$n && rm -rf $_
	mkdir -p /var/lib/manticore/$n
	searchd --config "$DIR/manticore-${n}.conf" --nodetach > >(sed 's/^/['$n'] /') 2>&1 &
done

# Wait for all searchd processes to start
sleep 2

# Creating cluster
mysql -h0 -P19306 -e 'CREATE CLUSTER c'
for n in $(seq 2 $nodes); do
	mysql -h0 -P"${n}9306" -e "JOIN CLUSTER c at '127.0.0.1:19312'"
done

echo "All searchd processes started. Press Cmd+C to stop."

# Wait indefinitely
while true; do
	sleep 1
done
bin/cluster
#!/usr/bin/env bash
set -x

PS4='[$(date "+%Y-%m-%d %H:%M:%S")] '
pgrep -f searchd | xargs kill -9
trap 'kill $(jobs -p)' EXIT
bin/cluster 2 >/dev/null 2>&1 &

sleep 3

mysql -h0 -P19306 -e "create table system.test (id bigint, name string, value json)"
mysql -h0 -P19306 -e "alter cluster c add system.test"

mysql -h0 -P19306 -e "update c:system.test set value = '{\"a\":2}' where name = 'node'" &
mysql -h0 -P29306 -e "insert into c:system.test values (4, 'slave', '{\"a\":1}')" &

echo 'Waiting for all updates to be replicated'
wait

echo 'Expecting to see some results'
mysql -h0 -P19306 -e "select * from system.test"
bin/mre

Manticore Search Version:

Latest dev

Operating System Version:

Ubuntu

Have you tried the latest development version?

None

Internal Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions