No FailureDetection read_only=on on all servers in a topology #865

dontstopbelieveing · 2019-04-17T21:59:56Z

I have a case in which all instances in a topology have read only on, which means that there is no write-able master available. However this is not detected as a failure and the OnFailureDetectionProcesses does not kick off. Nor does the Auto-Recovery.

Is there something I am missign in the configuration that will enable this detection and recovery?

~  ./orchestrator -c topology -i 10.176.3.10
10.176.3.10:3306      [0s,ok,5.7.23-23-log,ro,ROW,>>,GTID]
+ 10.176.12.136:3306  [0s,ok,5.7.23-23-log,ro,ROW,>>,GTID]
  + 10.176.14.19:3306 [0s,ok,5.7.23-23-log,ro,ROW,>>,GTID]
+ 10.176.13.44:3306   [0s,ok,5.7.23-23-log,ro,ROW,>>,GTID]

The text was updated successfully, but these errors were encountered:

shlomi-noach · 2019-04-18T04:19:39Z

This scenario was not designed for failover. What would be the solution? Perhaps just turn off read_only on the master of the topology?

I'm happy to make this a structure warning analysis ; but I'm not sure if I want to take it to failure detection + recovery.

Do you know at this time how you ended up with a read-only master? Does your monitoring cover such scenario?

dontstopbelieveing · 2019-04-18T05:02:08Z

We have machines that restart quite frequently due to various reasons, that can't be avoided. The my.cnf file has read only set to ON to avoid scenarios that would lead to writes going to two different servers. In these cases currently a DBA manually sets read only off on servers that are master.

Currently we don't have monitoring that is "topology-aware". In case such a restart happens I was wondering if we could use Orchestrator to notify us, since it is topology-aware. The remediation would be to set read only off but I am not depending on orchestrator to take this action. It would still be a manual step. Since there could be complexities related to which node should the read only off be on. Just looking at if I can configure Orchestrator to detect such a scenario and notify.

shlomi-noach · 2019-04-18T05:18:36Z

see some ideas on https://github.com/github/orchestrator/blob/master/docs/script-samples.md

dontstopbelieveing · 2019-04-29T19:52:59Z

Yes, those might be helpful, trying to figure out a way of doing this as a script. However it will help if this can be a structure warning analysis. Perhaps as a part of the replication-analysis command? A warning could definitely make my use case easier :)

shlomi-noach · 2019-04-30T06:26:25Z

yes, a structure warning would be part of replication-analysis.

shlomi-noach · 2019-04-30T08:31:56Z

@jfudally is this something you might be interested in?

jfudally · 2019-06-20T15:10:59Z

@dontstopbelieveing PR #878 was just merged into master. This PR introduces the NoWriteableMasterStructureWarning structure warning when the master node is read-only. Please take a look and let me know if this addresses your use-case.

liortamari · 2021-04-07T11:39:46Z

@shlomi-noach sorry for bumping this, I happen to notice in vitess the read-only master is handled.
https://github.com/vitessio/vitess/blob/25762d88e50d77352dfd5cc0bb41902f7215a3d9/go/vt/orchestrator/logic/topology_recovery.go#L1563
Do you think that vitess code could be useful here also? I would be happy to try and write PR if you see no reason not to.

liortamari · 2021-04-07T11:41:05Z

@dontstopbelieveing may I ask how solved this scenario eventually?

shlomi-noach · 2021-04-07T11:59:22Z

@liortamari good timing. Please look at #1332, unmerged yet. It solves the issue.

shlomi-noach added the contribution-friendly good for new contributors label Apr 30, 2019

jfudally self-assigned this May 1, 2019

jfudally mentioned this issue May 8, 2019

Add structure warning to replication-analysis when all masters are read_only #878

Merged

4 tasks

shlomi-noach closed this as completed Jun 23, 2019

shlomi-noach reopened this Jun 23, 2019

jfudally closed this as completed Nov 8, 2019

shlomi-noach mentioned this issue Apr 7, 2021

Introducing RecoverNonWriteableMaster flag #1332

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No FailureDetection read_only=on on all servers in a topology #865

No FailureDetection read_only=on on all servers in a topology #865

dontstopbelieveing commented Apr 17, 2019 •

edited

Loading

shlomi-noach commented Apr 18, 2019

dontstopbelieveing commented Apr 18, 2019

shlomi-noach commented Apr 18, 2019

dontstopbelieveing commented Apr 29, 2019

shlomi-noach commented Apr 30, 2019

shlomi-noach commented Apr 30, 2019

jfudally commented Jun 20, 2019

liortamari commented Apr 7, 2021

liortamari commented Apr 7, 2021

shlomi-noach commented Apr 7, 2021

No FailureDetection read_only=on on all servers in a topology #865

No FailureDetection read_only=on on all servers in a topology #865

Comments

dontstopbelieveing commented Apr 17, 2019 • edited Loading

shlomi-noach commented Apr 18, 2019

dontstopbelieveing commented Apr 18, 2019

shlomi-noach commented Apr 18, 2019

dontstopbelieveing commented Apr 29, 2019

shlomi-noach commented Apr 30, 2019

shlomi-noach commented Apr 30, 2019

jfudally commented Jun 20, 2019

liortamari commented Apr 7, 2021

liortamari commented Apr 7, 2021

shlomi-noach commented Apr 7, 2021

dontstopbelieveing commented Apr 17, 2019 •

edited

Loading