mysql replication problems

General support questions
didds1962
Posts: 17
Joined: 2020/01/20 16:19:25

mysql replication problems

Post by didds1962 » 2021/04/29 15:03:22

Im investigating the use of mysql over DRBD as a replacement for replication (long story).

Im following thisn how-to
http://mysqlhk.blogspot.com/2014/10/usi ... mysql.html

But in reality Ive only got as far as "Once the DRBD is configured correctly." and a yum update.

I followed the instructions in Linux Academy to set up DRBD, and had

server1: cat /proc/drbd showing Primary/Secondary
server2: cat /proc/drbd showing Secondary/Primary

and server 1 had /dev/drbd1 as an ext4 FS mounted on /var/lib/mysql. All good.

Some very basic tests then followed...

I restarted server2 (seocndary) ... server 1 then showed as Primary/Unknown. And when server 2 came back up it returned to Primary/Secondary.

All looking god thus far!

Then I restarted server1 (Primary)... oh dear...

Initially server2 still showed Secondary/Primary. then eventually Secondary/Unknown... then Secondary/Secondary. Where it has stayed even though server1 is back up.... and server1 now also shows Secondary/Secondary and wont mount /var/lib/mysql (aka /dev/drbd1 as per above) at all.

So

* does this hint at a configuration issue?
Or
* does it actually mean that if a DRBD pair loses its primary server, then the pair breaks ?

cheers

ian

didds1962
Posts: 17
Joined: 2020/01/20 16:19:25

Re: DRBD "failing" ... ?

Post by didds1962 » 2021/04/29 15:09:19

FWIW both servers have the same /etc/drbd.d/drbd1.res

resource drbd1 {
device /dev/drbd1;
disk /dev/vg-mysql/lv-mysql;
meta-disk internal;
on <server1 hostname> {
address <server1 IP>:7790;
}
on <server2 hostname> {
address <server2 IP>:7790;
}
}

which of course may be wrong, but is as good as i can work out from the docuementation I've followed thus far.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: DRBD "failing"?

Post by TrevorH » 2021/04/29 17:40:31

You have to run something like drbdadm primary $resource to make it primary. It can only be primary on one at a time if you're using an ordinary filesystem on it. To do this properly you really need to use corosync/pacemaker and set up a cluster with cluster controlled resources. It will then handle making the resource primary on one system at a time and starting mysqld on the right one, in the right order.... as long as you tell it to do so.

I'd just use replication, it's much easier.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

didds1962
Posts: 17
Joined: 2020/01/20 16:19:25

Re: DRBD "failing"?

Post by didds1962 » 2021/04/29 18:01:36

the drbdam has been run when it was initally set up etc etc etc. Are you meaning that it needs to run everytime it boots eg in a rc script or similar?

I'm rather loath to spend more time to test the entire setup all the way through (as you describe) if the whole thing fails/falls apart because we have had to reboot the master

FWIW - and I really want to keep this story short :-) - we are after many years of happy replication running into constant replication woes, and replciation breaking, that even when we have the stage of

Slave_IO_Running: Yes
Slave_SQL_Running: Yes

a simple heartbeat test we use to check replication is OK isnt indicating all is OK. The latest (today's) glitch has still not sorted itself out after about ten hours.

In recent months weve felt we've had to reset replication from scratch when this heartbeat check etc still fails after a few days, just to get replication working again... and the tedious issue with that is the entire end to end process takes over half a day to dump, copy, import, set up etc etc etc - which actually for production reasons means starting at circa 2100 and not completing until after 0100 the next morning. Replication just isnt "working" properly/reliably and as we would expect it to, for no obvious reasons seeing as its the same setup thats been in place for a decade (albeit the two DB servers being migrated a few times as OS becomes EOL etc).

We're just trying to investigate alternatives.

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: DRBD "failing"?

Post by TrevorH » 2021/04/29 23:12:53

You have to set it up properly or not at all. You either have to run the commands manually each time you start up to make the DRBD device primary and mount it/whatever. You have to make sure it's not primary and mounted on the other system even momentarily. Doing so will corrupt the filesystem metadata and need an fsck run. You can't tell it to go primary on boot unless you 100% know that it's not primary on the other one. Then the alternative is to run a cluster and tell the cluster manager about all the reources, like floating ip for the mysql server, the mysql server, the become primary, the mount, all of those, and tell it what order to run them in and what to do if anything fails. It's complicated.

As for replication, I do not understand how it can fail. I have about 30 or 40 slaves off a master-master and rarely, if ever, do I have to do anything unless one of them has been powered off unexpectedly.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

didds1962
Posts: 17
Joined: 2020/01/20 16:19:25

Re: DRBD "failing"?

Post by didds1962 » 2021/05/04 07:08:39

TrevorH wrote:
2021/04/29 23:12:53
You have to set it up properly or not at all. You either have to run the commands manually each time you start up to make the DRBD device primary and mount it/whatever. You have to make sure it's not primary and mounted on the other system even momentarily. Doing so will corrupt the filesystem metadata and need an fsck run. You can't tell it to go primary on boot unless you 100% know that it's not primary on the other one. Then the alternative is to run a cluster and tell the cluster manager about all the reources, like floating ip for the mysql server, the mysql server, the become primary, the mount, all of those, and tell it what order to run them in and what to do if anything fails. It's complicated.
Thansk for that - it certainly shows it cannot be the solution we need. I appreciate your candour.
As for replication, I do not understand how it can fail. I have about 30 or 40 slaves off a master-master and rarely, if ever, do I have to do anything unless one of them has been powered off unexpectedly.
i dont get it either. Its driving me up the bloody wall. Lat time replciation broke was last Wednesday. We "fixed" it by a time old use of

STOP SLAVE; SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;START SLAVE

until eventually we get

mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: mysqldbdynabook.internal.toshiba-europe.com
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.014853
Read_Master_Log_Pos: 95442533
Relay_Log_File: z8lwhippet-relay-bin.000590
Relay_Log_Pos: 321405815
Relay_Master_Log_File: mysql-bin.014736
Slave_IO_Running: Yes <<<<<<<
Slave_SQL_Running: Yes <<<<<<<
...


but the heartbeat check we use that replication is in line has reported not aligned/replicated since then.

hence when ever this happens we inevitably end up spending half a day fixing it from scratch/resetting it from scratch - and this is now happening at least monthly

:-(

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: DRBD "failing"?

Post by TrevorH » 2021/05/04 09:04:02

Post your my.cnf files for review.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

didds1962
Posts: 17
Joined: 2020/01/20 16:19:25

Re: DRBD "failing"?

Post by didds1962 » 2021/05/04 09:49:49

MASTER:

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
default-storage-engine=MYISAM


symbolic-links=0

bind-address=10.108.248.162
max_connections=1500
max_user_connections = 150
binlog_cache_size = 131072
key_buffer_size = 32M
max_allowed_packet = 1024M
table_definition_cache = 16384
table_open_cache = 16384
sort_buffer_size = 8M
read_buffer_size = 8M
join_buffer_size = 64M
read_rnd_buffer_size = 4M
bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size= 128M
query_cache_limit = 8M
query_cache_type = 1
query_prealloc_size = 65536
query_alloc_block_size = 131072
query_cache_min_res_unit = 4096
tmp_table_size=384M
max_heap_table_size=512M
lower_case_table_names=1

log-bin=mysql-bin
max_binlog_size = 512M
expire-logs-days = 2
binlog_format = MIXED

relay_log_purge = 1
server-id = 2

innodb_file_per_table
innodb_data_home_dir = /var/lib/mysql/innodb
innodb_data_file_path = ibdata1:16G;ibdata2:32G;ibdata3:10M:autoextend:max:32G
innodb_log_group_home_dir = /var/lib/mysql/innodb
innodb_buffer_pool_size = 12G
innodb_log_file_size = 128M
innodb_log_files_in_group = 3
innodb_log_buffer_size = 16M
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 200
innodb_thread_concurrency = 12
innodb_max_dirty_pages_pct = 90

innodb_print_all_deadlocks = 1

log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

innodb_buffer_pool_instances = 5

ignore-db-dir=lost+found

max_connect_errors = 100000
max_error_count = 10000

replicate-ignore-db = smart

[mysqldump]
quick
quote-names
max_allowed_packet = 1024M
ignore-table = mysql.events
routines=true

[mysqld]
show_compatibility_56 = ON

================================================

SLAVE:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
default-storage-engine=MYISAM

symbolic-links=0

bind-address=10.108.248.163
max_connections=1500
max_user_connections = 150
binlog_cache_size = 131072
key_buffer_size = 32M
max_allowed_packet = 1024M
table_definition_cache = 16384
table_open_cache = 16384
sort_buffer_size = 8M
read_buffer_size = 8M
join_buffer_size = 64M
read_rnd_buffer_size = 4M
bulk_insert_buffer_size = 64M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size= 128M
query_cache_limit = 8M
query_cache_type = 1
query_prealloc_size = 65536
query_alloc_block_size = 131072
query_cache_min_res_unit = 4096
tmp_table_size=384M
max_heap_table_size=512M
lower_case_table_names=1


log-bin=mysql-bin
max_binlog_size = 512M
expire-logs-days = 2
binlog_format = MIXED
relay_log_purge = 1
server-id = 1



innodb_file_per_table
innodb_data_home_dir = /var/lib/mysql/innodb

innodb_data_file_path = ibdata1:16G;ibdata2:10M:autoextend:max:32G
innodb_log_group_home_dir = /var/lib/mysql/innodb
innodb_buffer_pool_size = 12G


innodb_log_file_size = 128M
innodb_log_files_in_group = 3
innodb_log_buffer_size = 16M
innodb_flush_log_at_trx_commit = 2
innodb_lock_wait_timeout = 200
innodb_thread_concurrency = 12
innodb_max_dirty_pages_pct = 90


innodb_print_all_deadlocks = 1

log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

innodb_buffer_pool_instances = 5

ignore-db-dir=lost+found


max_connect_errors = 100000
max_error_count = 10000


replicate-ignore-db = smart


[mysqldump]
quick
quote-names
max_allowed_packet = 1024M
ignore-table = mysql.events
routines=true

[mysqld]
show_compatibility_56 = ON

User avatar
TrevorH
Site Admin
Posts: 33202
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: DRBD "failing"?

Post by TrevorH » 2021/05/04 10:46:40

First thing that springs out at me is that you do not have

auto_increment_increment = 2
auto_increment_offset = x

on either side and you need them. The offset should be even on one side and odd on the other (i.e 1 and 2) so that it never tries to insert rows with duplicate ids. All your replicated tables need auto_increment columns.

Edit: actually this may only be required if you're running in master-master mode
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

didds1962
Posts: 17
Joined: 2020/01/20 16:19:25

Re: DRBD "failing"?

Post by didds1962 » 2021/05/04 12:03:05

TrevorH wrote:
2021/05/04 10:46:40
First thing that springs out at me is that you do not have

auto_increment_increment = 2
auto_increment_offset = x

on either side and you need them. The offset should be even on one side and odd on the other (i.e 1 and 2) so that it never tries to insert rows with duplicate ids. All your replicated tables need auto_increment columns.
cheers! this is a inherited setup so I really appreciate the pointer!

So

MASTER
auto_increment_increment = 2
auto_increment_offset = 2

SLAVE
auto_increment_increment = 2
auto_increment_offset = 1

correct?

Post Reply