Difference between revisions of "System Failover"

 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
시스템 이중화란 두개의 장비를 Active-Passive 형태로 시스템을 유지하여 어느 한 장비의 장애 발생시 서비스의 연속성을 유지하기 위함입니다.<br>
+
__FORCETOC__
하나의 장비는 Active 모드로 그리고 다른 장비는 Passive 모드로 시작합니다.
+
=== System Failover ===
[[File:failover.png|400px|thumb|Failover 진행]]
+
[[File:failover.png|400px|thumb|Failover Flow]]
가상 IP 주소는 Active 장비에 항상 연결되어 장애 전환 이후에도 서비스의 연속성을 보장합니다.
 
  
Passive 모드 장비는 Active 장비를 수 초 간격으로 감지하며 지정된 시간(deadtime) 이상 Active 장비로 부터 응답이 없으면 Active 모드로 전환합니다. 또한 실시간 데이터베이스 동기화가 진행되어 장애 전환 이후 데이터의 연속성도 보장합니다.
+
The system failover is switching to a passive system when an active system is in a state of failure(e.g. hardware fault, network problem).
  
Active / Passive 장비의 각 역할은 다음 표와 같습니다.
+
A passive system synchronizes all data from the master database in which is running in an active system by the Database Replication<ref>https://mariadb.com/kb/en/replication-overview/</ref> and monitors the active system.
 +
 
 +
A passive system sends a heartbeat about every 10 seconds to an active system. If the passive does not receive a response from an active system for the [[CLI - Configuring System Failover | Deadtime]], the passive system switches the mode to "active".
 +
 
 +
This system failover enables continuous service and you can connect the management page without changing the URI because a virtual IP is automatically configured in an active system by the [[ImRAD services(daemons) | failover service]].
 +
 
 +
The failover service in an active system listens on UDP port 6010 to receive a heartbeat from a passive system.
 +
 
 +
All services in the active and passive devices work as shown in the table below.
 
{| class="wikitable"
 
{| class="wikitable"
! 역할 !! Active !! Passive
+
! Service !! Active System !! Passive System
 
|-
 
|-
| dhcpv4 || O || O
+
| dhcpv4 || running || running
 
|-
 
|-
| dhcpv6 || O || O
+
| dhcpv6 || running || running
 
|-
 
|-
| radius || O || O
+
| radius || running || running
 
|-
 
|-
| logexp || 모든 역할 || 로컬 syslog만 저장
+
| logexp || running  || running but only saves its Syslog.
 
|-
 
|-
| failover || 가상 IP 설정 || Active 장비 감시
+
| failover || Configuring a Virtual IP || Monitoring an active system and replicating the database
 
|-
 
|-
| 데이터베이스 || Master || Slave
+
| Database || Master || Slave
 
|-
 
|-
 
|}
 
|}
  
이중화모드에서 failback은 지원하지 않으며 Active 장비가 장애 발생 후 다른 장비로 Active 모드가 전환된 이후 복구되면 Passive 모드로 전환됩니다(그림 마지막 단계).
+
{{note|Note that, If an active system is recovered from a fault after another system has switched to the active mode, the recovered system switches to passive mode. In other words, The failback<ref>https://en.wikipedia.org/wiki/Failover</ref> does not occur.}}
 +
 
 +
===== Configuration =====
 +
'''If you want to apply the failover in your devices, You must configure the System Failover via the [[CLI - Configuring System Failover | CLI]]''' and start the failover service on both devices.
 +
 
 +
 
 +
=== System Failover Switch-Over===
 +
The following table shows you when a system switches its mode. Several case numbers indicate a Switch-Over condition and you can see them while monitoring the [[CLI - Log | logs]] of the failover service.
 +
{{note|Note that you need to enable the [[CLI - Services(daemons) | "runtime log"]] for the failover service to display logs.
 +
}}
 +
 
 +
{| class="wikitable"
 +
! Init mode !! Peer Response !! Current mode(switched mode) !!  Case Number
 +
|-
 +
| rowspan='4' | active || no-response || active || C7
 +
|-
 +
| zero(initializing) || active || C1, C5
 +
|-
 +
| passive || active || C5
 +
|-
 +
| active || passive || C4
 +
|-
 +
| rowspan='4' | passive || no-response || active || C6
 +
|-
 +
| zero(initializing) || passive || C1, C3
 +
|-
 +
| passive || active || C4
 +
|-
 +
| active || passive || C5
 +
|-
 +
|}
 +
 
 +
===== System Failover Case Numbers and Conditions =====
 +
{| class="wikitable"
 +
! Case Number !! Description
 +
|-
 +
| C1 || If DEVICE#1 is in an initialization state, and it gets a response from DEVICE#2 that is also initializing, DEVICE#1 switches its mode to the Initial mode configured.
 +
|-
 +
| C2 || If DEVICE#1 is in an initialization state, and it gets a response from DEVICE#2 that is in either an "active" or "passive" state, DEVICE#1 switches to the opposite mode from DEVICE#2.
 +
|-
 +
| C3 ||  If DEVICE#1 is in a "passive" state, and it gets a response from DEVICE#2 that is initializing, DEVICE#1 keeps its current state.
 +
|-
 +
| C4 || If DEVICE#1 is in either an "active" or "passive" state, and it gets a response from DEVICE#2 that is in the same state as it is, DEVICE#1 switches to the opposite mode from DEVICE#2.<br>
 +
Typically this case rarely occurs but it can occur because of the misconfiguring of the system failover(e.g. configuring the same initial mode to both devices).
 +
|-
 +
| C5 || If DEVICE#1 is in either an "active" or "passive" state, and it gets a response from DEVICE#2 that is in the opposite state as it is, DEVICE#1 keeps its current state.
 +
|-
 +
| C6 || If DEVICE#1 is in an initialization state, the initial mode is "passive", and it does not get a response from DEVICE#2, DEVICE#1 tries to connect again to DEVICE#2 without switching its mode.
 +
If DEVICE#2 does not respond during the INIT-DEADTIME(60 seconds), DEVICE#1 switches its mode to "active".<be>
 +
 
 +
For this reason, you should start the failover service where the Initial mode is configured to "passive" after running the service where the Initial mode is configured to "active".
 +
|-
 +
| C7 || If DEVICE#1 is in an initialization state, the initial mode is "active", and it does not get a response from DEVICE#2, DEVICE#1 switches its mode to "active".
 +
|-
 +
| C8 || If DEVICE#1 is in a "passive" state, and it does not get a response from DEVICE#2, DEVICE#1 tries to connect again to  DEVICE#2 without switching its mode.
 +
If DEVICE#2 does not respond during the deadtime configured, DEVICE#1 switches its mode to "active".
 +
|-
 +
| C9 || It shows you DEVICE#1 is in an "active" state and it does not get a response from DEVICE#2.<br>
 +
{{note|Note that this case never occurs because a device in an "active" state does not send a message to know the health of another.}}
 +
|-
 +
| C10 || If the status of the device's network interface is down, the device switches its mode to a "zero" state. The "zero" indicates that it is in initialization mode. If the state of the device was "active", the device removes the virtual IP address that was set before.
 +
|-
 +
| C11 || If DEVICE#1 is in a "passive" or initialization state and it gets a response from DEVICE#2 where the System failover is disabled, DEVICE#1 keeps its current state. To fix this problem, you should enable the System failover in DEVICE#2.
 +
|-
 +
| C12 || If DEVICE#1 is in a "passive" or initialization state and it gets a response with an incorrect shared secret from DEVICE#2. DEVICE#1 keeps its current state. To fix this problem, you should [[CLI - System Failover | verify the shared secret]] and change the incorrect one.
 +
|-
 +
| C13 || It may occur when DEVICE#1 is in an initialization state and the configured initial mode is neither an "active" nor "passive". You should configure the failover again.
 +
|-
 +
|}
  
이중화 설정은 CLI를 통해 설정할 수 있으며 "[[CLI - 시스템 이중화 설정|이중화 설정]]"을 참고하세요.
+
=== References ===

Latest revision as of 14:21, 14 May 2021

System Failover

Failover Flow

The system failover is switching to a passive system when an active system is in a state of failure(e.g. hardware fault, network problem).

A passive system synchronizes all data from the master database in which is running in an active system by the Database Replication[1] and monitors the active system.

A passive system sends a heartbeat about every 10 seconds to an active system. If the passive does not receive a response from an active system for the Deadtime, the passive system switches the mode to "active".

This system failover enables continuous service and you can connect the management page without changing the URI because a virtual IP is automatically configured in an active system by the failover service.

The failover service in an active system listens on UDP port 6010 to receive a heartbeat from a passive system.

All services in the active and passive devices work as shown in the table below.

Service Active System Passive System
dhcpv4 running running
dhcpv6 running running
radius running running
logexp running running but only saves its Syslog.
failover Configuring a Virtual IP Monitoring an active system and replicating the database
Database Master Slave

Note that, If an active system is recovered from a fault after another system has switched to the active mode, the recovered system switches to passive mode. In other words, The failback[2] does not occur.

Configuration

If you want to apply the failover in your devices, You must configure the System Failover via the CLI and start the failover service on both devices.


System Failover Switch-Over

The following table shows you when a system switches its mode. Several case numbers indicate a Switch-Over condition and you can see them while monitoring the logs of the failover service.

Note that you need to enable the "runtime log" for the failover service to display logs.

Init mode Peer Response Current mode(switched mode) Case Number
active no-response active C7
zero(initializing) active C1, C5
passive active C5
active passive C4
passive no-response active C6
zero(initializing) passive C1, C3
passive active C4
active passive C5
System Failover Case Numbers and Conditions
Case Number Description
C1 If DEVICE#1 is in an initialization state, and it gets a response from DEVICE#2 that is also initializing, DEVICE#1 switches its mode to the Initial mode configured.
C2 If DEVICE#1 is in an initialization state, and it gets a response from DEVICE#2 that is in either an "active" or "passive" state, DEVICE#1 switches to the opposite mode from DEVICE#2.
C3 If DEVICE#1 is in a "passive" state, and it gets a response from DEVICE#2 that is initializing, DEVICE#1 keeps its current state.
C4 If DEVICE#1 is in either an "active" or "passive" state, and it gets a response from DEVICE#2 that is in the same state as it is, DEVICE#1 switches to the opposite mode from DEVICE#2.

Typically this case rarely occurs but it can occur because of the misconfiguring of the system failover(e.g. configuring the same initial mode to both devices).

C5 If DEVICE#1 is in either an "active" or "passive" state, and it gets a response from DEVICE#2 that is in the opposite state as it is, DEVICE#1 keeps its current state.
C6 If DEVICE#1 is in an initialization state, the initial mode is "passive", and it does not get a response from DEVICE#2, DEVICE#1 tries to connect again to DEVICE#2 without switching its mode.

If DEVICE#2 does not respond during the INIT-DEADTIME(60 seconds), DEVICE#1 switches its mode to "active".<be>

For this reason, you should start the failover service where the Initial mode is configured to "passive" after running the service where the Initial mode is configured to "active".

C7 If DEVICE#1 is in an initialization state, the initial mode is "active", and it does not get a response from DEVICE#2, DEVICE#1 switches its mode to "active".
C8 If DEVICE#1 is in a "passive" state, and it does not get a response from DEVICE#2, DEVICE#1 tries to connect again to DEVICE#2 without switching its mode.

If DEVICE#2 does not respond during the deadtime configured, DEVICE#1 switches its mode to "active".

C9 It shows you DEVICE#1 is in an "active" state and it does not get a response from DEVICE#2.

Note that this case never occurs because a device in an "active" state does not send a message to know the health of another.

C10 If the status of the device's network interface is down, the device switches its mode to a "zero" state. The "zero" indicates that it is in initialization mode. If the state of the device was "active", the device removes the virtual IP address that was set before.
C11 If DEVICE#1 is in a "passive" or initialization state and it gets a response from DEVICE#2 where the System failover is disabled, DEVICE#1 keeps its current state. To fix this problem, you should enable the System failover in DEVICE#2.
C12 If DEVICE#1 is in a "passive" or initialization state and it gets a response with an incorrect shared secret from DEVICE#2. DEVICE#1 keeps its current state. To fix this problem, you should verify the shared secret and change the incorrect one.
C13 It may occur when DEVICE#1 is in an initialization state and the configured initial mode is neither an "active" nor "passive". You should configure the failover again.

References