-
Notifications
You must be signed in to change notification settings - Fork 935
No FailureDetection read_only=on on all servers in a topology #865
Comments
This scenario was not designed for failover. What would be the solution? Perhaps just turn off I'm happy to make this a structure warning analysis ; but I'm not sure if I want to take it to failure detection + recovery. Do you know at this time how you ended up with a read-only master? Does your monitoring cover such scenario? |
We have machines that restart quite frequently due to various reasons, that can't be avoided. The my.cnf file has read only set to ON to avoid scenarios that would lead to writes going to two different servers. In these cases currently a DBA manually sets read only off on servers that are master. Currently we don't have monitoring that is "topology-aware". In case such a restart happens I was wondering if we could use Orchestrator to notify us, since it is topology-aware. The remediation would be to set read only off but I am not depending on orchestrator to take this action. It would still be a manual step. Since there could be complexities related to which node should the read only off be on. Just looking at if I can configure Orchestrator to detect such a scenario and notify. |
Here's one way of doing it: see some ideas on https://github.com/github/orchestrator/blob/master/docs/script-samples.md |
Yes, those might be helpful, trying to figure out a way of doing this as a script. However it will help if this can be a structure warning analysis. Perhaps as a part of the replication-analysis command? A warning could definitely make my use case easier :) |
yes, a structure warning would be part of |
@jfudally is this something you might be interested in? |
@dontstopbelieveing PR #878 was just merged into |
@shlomi-noach sorry for bumping this, I happen to notice in vitess the read-only master is handled. |
@dontstopbelieveing may I ask how solved this scenario eventually? |
@liortamari good timing. Please look at #1332, unmerged yet. It solves the issue. |
I have a case in which all instances in a topology have read only on, which means that there is no write-able master available. However this is not detected as a failure and the OnFailureDetectionProcesses does not kick off. Nor does the Auto-Recovery.
Is there something I am missign in the configuration that will enable this detection and recovery?
The text was updated successfully, but these errors were encountered: