We have mysql 8.0.33-25.1 percona wsrep cluster with 6 nodes. The percona cluster is stable and works as expected.
One of the nodes is also a binlog enabled replication primary.
On the other side is 1 replica.
Those are AWS instances of comparable size, the replica has more dedicated IOPS than the primary, due to the nature of its workload.
This same setup was used on mysql5.7 which we migrated away from a few months ago.
Since the migration the replica is behaving incredibly strangely. I understand that there are probably some under-the-hood changes on mysql replication, but what we are experiencing is not stable at all.
The workload on that replica is mostly reads. On the replica there are databases that do not get replicated that do have writes, but as far as replication is concerned on that replica it is only reads.
Here are 2 example workflows that are running on the replica:
DB1 is replicated from the primary and the replica reads data from there, creates “snapshots” of the data and writes to DB2 on the replica (which is not replicated in any server).
Tableau reads data sequentially once per day that is then used for reports throughout the day from the replica on a schedule.
There are other workloads of smaller scale but same type that happens on that replica.
The problem that we have is during any significant read query the replication pauses and waits for the read query to finish before continuing. Resulting in often timeouts, as those read queries may run for (sometimes) many hours. This results in the following error:
2024-03-21T09:49:22.030113Z 2386083 [Warning] [MY-010584] [Repl] Replica SQL for channel '': Worker 1 failed executing transaction 'ANONYMOUS' at source log mysql-bin.000639, end_log_pos 853032011; Could not execute Update_rows event on table DB1.table1; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's source log mysql-bin.000639, end_log_pos 853032011, Error_code: MY-001205
We can increase the timeout, but thats not what we are after. Prior to the migration, the 5.7 version of the replica was keeping up to date with replication throughout the day. During these past 2 months, we have tried tweaking various variables in an effort to get back to the same state as in 5.7, but with no success. These are the final variables that we are using at the moment different from the previous defaults that we had:
replica_preserve_commit_order = "OFF";
replica_parallel_type = "DATABASE";
replica_parallel_workers=1;
innodb_adaptive_hash_index="OFF";
And this is our default cnf (some values are edited out):
# Template my.cnf for PXC
# Edit to your requirements.
[client]
socket=XXX
default-character-set=utf8
[mysqld]
open-files-limit = 100000
table_open_cache = 20000
innodb_open_files = 20000
character-set-server = utf8
collation-server = utf8_unicode_ci
datadir = XXX
long_query_time = 5
slow_query_log = 1
slow_query_log_file = XXX
log_bin = mysql-bin
log-error = XXX
pid-file = XXX
default_storage_engine = InnoDB
pxc_strict_mode = ENFORCING
sql_mode = ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
early-plugin-load = keyring_file.so
keyring-file-data = XXX
pxc-encrypt-cluster-traffic = ON
socket = XXX
server-id = 9
binlog-ignore-db = information_schema
binlog-ignore-db = mysql
binlog-ignore-db = performance_schema
binlog-ignore-db = XXX
binlog_format = ROW
log_slave_updates = ON
binlog_expire_logs_seconds = 604800
ft_min_word_len = 1
ft_stopword_file = ''
join_buffer_size = 256K
key_buffer_size = 16M
lc-messages-dir = /usr/share/mysql
lower_case_table_names = 0
max_allowed_packet = 512M
max_connections = 400
max_heap_table_size = 16M
memlock = 0
read_buffer_size = 256K
read_rnd_buffer_size = 512K
replica_preserve_commit_order = OFF
replica_parallel_type = DATABASE
replica_parallel_workers = 1
skip-external-locking
skip_name_resolve
sort_buffer_size = 256K
sysdate_is_now = 1
table_definition_cache = 20000
thread_cache_size = 20
thread_stack = 256K
tmp_table_size = 16M
tmpdir = XXX
net_read_timeout = 60
# 5.6 backwards compatibility
explicit_defaults_for_timestamp = 1
# INNODB OPTIONS - You may want to tune the below depending on number of cores and disk sub
innodb_autoinc_lock_mode = 1
innodb_adaptive_hash_index = OFF
innodb_buffer_pool_instances = 8
innodb_buffer_pool_size = 12G
innodb_doublewrite = 1
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 3
innodb_flush_method = O_DIRECT
innodb_io_capacity = 200
innodb_log_buffer_size = 8M
innodb_log_files_in_group = 2
innodb_log_file_size = 128M
innodb_read_io_threads = 4
innodb_thread_concurrency = 0
innodb_write_io_threads = 4
# avoid statistics update when doing e.g show tables
innodb_ft_min_token_size = 0
innodb_stats_on_metadata = 0
[mysqld_safe]
pid-file = XXX
#syslog
malloc-lib = /usr/lib64/libtcmalloc.so.4.4.5
I am assuming that we are missing a key piece here, but so far unable to find it. As a last resort we are considering changing the default TXN isolation from REPEATABLE READ to READ COMMITTED or UNCOMMITTED, which may help with the locking, but that will require us to do significantly more research into implications in our use cases, so would like to avoid if at all possible.
Any suggestions are welcomed?
This is an invalid value. I’m surprised MySQL even starts. Change this to 2.
ROW-based replication should not impose any locks on replicas, unless FOREIGN KEYs are involved. Do you have any of these?
It would be helpful to understand the locks. Find a SELECT query that runs on the replica that reproduces the issue. Run the query in session #1. While it is locked, in session #2, select from sys. innodb_lock_waits and see what query is blocking. This might help determine what is happening.
These were all in effort to get back to mysql 5.7 type of replication and performance. Because with LOGICAL_CLOCK and replica_preserve_commit_order we were getting even worse performance and locking.
innodb_flush_log_at_trx_commit reverts to a value of 2 at runtime, but yes, i was surprised it works too.
There are no foreign keys in any database, we removed them years ago due to other performance issues.
As far as the locks, we know which table is getting locked, due to replication needing to update some of its records. Here is an example:
wait_started wait_age wait_age_secs locked_table locked_table_schema locked_table_name locked_table_partition locked_table_subpartition locked_index locked_type waiting_trx_id waiting_trx_started waiting_trx_age waiting_trx_rows_locked waiting_trx_rows_modified waiting_pid waiting_query waiting_lock_id waiting_lock_mode blocking_trx_id blocking_pid blocking_query blocking_lock_id blocking_lock_mode blocking_trx_started blocking_trx_age blocking_trx_rows_locked blocking_trx_rows_modified sql_kill_blocking_query sql_kill_blocking_connection
2024-04-09 08:30:33 00:00:09 9 `DB1`.`table1` DB1 table1 NULL NULL idx_contact RECORD 117850381337 2024-04-09 08:30:33 00:00:09 1 1 1697121 NULL 139791340341016:665006:958203:570:139774144774368 X,GAP,INSERT_INTENTION 421266317044760 2287080 INSERT INTO `DB2`.`table2`.`Value` 139791340334104:665006:958203:570:139784041354192 S,GAP 2024-04-09 08:29:52 00:00:50 16580219 0 KILL QUERY 2287080 KILL 2287080
The blocking thread in question 2287080:
ID USER HOST DB COMMAND TIME STATE INFO TIME_MS ROWS_SENT ROWS_EXAMINED
2287080 XXX XXX DB1 Query 58 executing INSERT INTO `DB2`.`table2`........ SELECT ...... COUNT(DISTINCT `table1`.`ContactId`) ............ FROM `table3` ........ ON DUPLICATE KEY↵ UPDATE `XXXX` = VALUES(`XXXX`) 57595 0 0
The blocking query is a INSERT SELECT query, it selects from DB1 which is replicated from the primary to the replica and inserts into DB2 which is not replicated anywhere and is only on the replica, used for aggregation.
While DB1.table1 is locked replication start lagging behind with:
ID USER HOST DB COMMAND TIME STATE INFO TIME_MS ROWS_SENT ROWS_EXAMINED
1697121 system user DB1 Query 24 Applying batch of row changes (write) NULL 23867 0 0
1697120 system user NULL Query 21 Reading event from the relay log NULL 20856 0 0
Some of those queries can have a runtime of several thousand seconds, which then leads to the replication to stop with a timeout error i shared in the first post.
hristomitev:
The blocking query is a INSERT … SELECT query, it selects from DB1 which is replicated from the primary
This, naturally, imposes (S)hared locks on the DB1.table1 table as it is being SELECTed. This prevents table1 from being modified while the “SELECT *” is working. Without this lock, table1 could be modified and the INSERT would no longer be (C)onsistent, nor (I)solated (eg: ACID compliant). This lock also blocks replication.
Replication always operates in READ-COMMITTED mode and you can’t change that. I would suggest changing your I … S to use READ-COMMITTED as I can see the query is also GAP-locking, which means it’s not only locking the actual existing rows, it’s also locking the “gaps” of missing rows and the “gaps” at the end of the table. R-C does not have gap locks.