“Learn 3 things A Day – Learning 3”

Sentinel: Automatic failover system in Redis

High Availability options in Redis include “Clustering” and “Master Slave” topologies. Clustering is an extension of multiple “Master Slave” sets, where data is partitioned into mutually exclusive data sets and stored in dedicated “Master Slave(s)” sets.

In Redis all these three tasks are performed by a separate process “Sentinel”. Sentinels can be run on same server as Redis or on an independent servers.

Master – Slave Topology: In Master Slave topology, applications connect to master Redis instance to perform read – write operations. Data thus generated will be asynchronously replicated to Slave Redis instance. To persist data to disk, Redis provides features like Snapshot and AOF (Append Only File). But such persistence of data to disk is for disaster recovery purposes though could be used for High Availability.

As shown below data generated by multiple clients is written to Master, that ultimately (due to asynchronous nature) gets replicated to Slave. Sentinel process(es) (could be more than 1) can be configured to monitor multiple master sets.

  • M1…Mn = Masters
  • S1…Sn = Slaves
  • C1..Cn = Sentinels


Redis Clustering: Redis clustering is extending multiple Master – Slave sets to store portion of data. Simple clustering topology is depicted below. Notice in below topology data is split to multiple disjoint sets and each set is stored in a different Master Slave sets.


As per Wikipedia, http://en.wikipedia.org/wiki/High_availability, three principles of high availability engineering are

  • Elimination of Single point of failure
    • This means adding redundancy to the system so that failure of a component does not mean failure of the entire system.
  • Reliable crossover
    • In multithreaded systems, the crossover point itself tends to become a single point of failure. High availability engineering must provide for reliable crossover.
  • Detection of failures as they occur
    • If the two principles above are observed, then a user may never see a failure. But the maintenance activity must.

A more practical requirements for system to highly available:

  • Monitoring is to be enabled
  • Failure Detection. To detect failures or perceived failed scenarios
  • Notification, Action for automatic correction of situation (generally failover)

Example of High Availability in SQL Server on Windows Clustering:

  • Monitoring: SQL Resource executable that is installed as part of cluster installation monitors health of SQL Server instance.
    • In earlier versions of SQL it performs IsAlive and LooksAlive checks.
  • Failure Detection: If multiple IsAlive and LooksAlive checks fail it is considered as SQL Server instance failure
  • Decisive Action: Failover SQL Server from Active to Passive windows node

Nature of failures that systems have to withstand:

  • Executable crash
    • Redis may fail (improper exception handling??)
    • Sentinel may crash
  • System crash due to H/W or OS related issues.
    • In such a scenario Redis instances and any monitoring Sentinels present will fail.
  • Network disruptions
    • Set of systems of same network may become unavailable (even though Redis and Sentinel instances may be working)
    • Network disruptions (also terms as network partitions) can be across multiple networks.

Sentinels should detect failures of Redis and take appropriate actions.

How does Sentinel monitor and failover Redis instance?

  • Sentinel instances ping Redis instance and queries port of master Redis instance. Failure to contact / query master Redis instance is considered failure.
  • To avoid fault positives of Redis failures, multiple sentinels monitor same master. All sentinels cooperate (called Quorum) to declare master has failed and trigger failover.
  • But prior to failover, majority number of sentinels need to be present.

This is just a tip of iceberg on failover of Redis.. Need to learn more.