School Papers

A if the system gets off for periodic

A
computer cluster is a logical unit which consists of multiple systems connected
through LAN. Computer cluster provides high processing speed, more storage
capacity, high reliability and resource availability. Organisations use
clusters to maximise the processing speed, increase the storage space and
implement all the data storing and retrieving techniques. There are 4
principles that should be noted down in case of computer clusters. The
principles which I focus on as follows:

1)
High availability through redundancy

2)
Fault-Tolerant cluster configurations

I High Availability
through redundancy

An
important strategy for maintaining high availability is having redundant components.
If a component fails, then a backup of that component can be used to serve the
purpose for the applications, If the component of that system is not redundant,
then that component has single point of failure. The computer system works
normally till it fails. If the failure occurs then the redundant component
works until the failed system is recovered.

System
failure can be

1.      Planned
failure

2.      Unplanned
failure

3.      Transient
failure

4.      Permanent
failure

5.      Partial
failure

6.      Total
failure

Planned
failure may occur if the system gets off for periodic upgrades, maintenance and
some other technical issues. Unplanned failure occurs hardware failure, system
crash, network problem, power outage and so on…. Transient failure is
temporary which occur and then disappear without replacing any component as we
shutdown the system due to some frozen window. Permanent failures need some
replacement in the hardware or software to make it work. Partial failure is
when some parts of the system are affected or not working but still the system
is useable. Total failure is when the whole system gets damaged which cannot be
used further. One aspect of high availability is to reduce the as many system
failures to partial failures by removing single point of failures. e.g.  The cluster has several nodes with come data
inside, if the node fails then the node along with the data will not recover
until the node gets recovered. This problem has a solution if the cluster has
shared disk, then every node will store the data in the disk so that if one
node fails then the data can be used by the other node.

Choosing redundancy plan
to ensure high availability:

The
redundancy and replication techniques to be chosen are:

RAID redundancy
– This type will mirror one disk over the other so that if one system fails,
the other will still be available to serve the purpose of the first.

Block-Level redundancy
– This type will mirror the entire block structure so that if a system gets any
failure, the work can be done as usual without pausing since there is a copy of
that application in other machine.

SQL redundancy
– This type has some built in features so that data can be used in case of some
system failures also.

Master-Slave replication
– In this replication there will be a master server where all the data will be
stored and there is Slave server where the copy of all data from master gets
replicated. So when any failure happens then the data can be fetched easily.

Master-Master replication
– This type will have two master servers with equal access abilities, so when
one master goes down then the operations can be managed by the other master.

Example of showing
availability through redundancy:

A
company like flipkart, Amazon, Snapdeal, sells their products to the customers
through websites; in this case the website or the application should be
available to the users all the time. Even if the system has some issues that
should not affect the customers viewing it. To avoid this issue the company’s
will have number of severs formed as a cluster to serve the purpose, so that if
one server fails the other which is redundant to hat will continue processing
the request given by the customers, the company should also have number of
backup internet connections, different power line, different backup centre
grouped as clusters located in different places. All these measures make the
website or the application available to the customers 24*7 without any trouble.

A sample topology using
redundancy for high availability:

The
third master is used only in case of any failures and the rest of the
operations are performed by the other 2 masters. The requests will be rerouted
to the master servers by the directory proxy server in case of any failures.
Master 1 requests will be rerouted to the master 2 or master 3 by the directory
proxy servers till the master 1 gets repaired. If any updates are done to the
rerouted master then it will be replicated since they are redundant to each
other.

II
Fault-Tolerant Cluster configurations

The
cluster solution provides support to availability with three levels:

1)
Hot Standby

2)
Active takeover

3)
fault tolerant

Hot
standby is a redundant method in which a single system runs parallel with the
identical system. When failure occurs, the hot standby system immediately
replaces the identical system, so that there will be same data identical in
both the systems. A hot standby system will be located close to the identical
system in the same building or another building or even other country or state.
Some examples that describe hot standby components include network printers,
hard drives; audio or visual switches and so on… Active takeover is when a
node fails, the application fails to fetch the available node present in the
cluster, it may take some time to implement the failover, so that the user will
experience some delay in the application. Failover cluster is when the
component fails; this failover technique will make the remaining components
take the job of the failed cluster in order to maintain availability. To
identify whether the node or the system is active or not, a heartbeat technique
is used to send a stream of heartbeat messages from one cluster to another
cluster, if the system does not receive the heartbeat then we the system can
conclude that the node is failed.

A
cluster uses multiple networks to connect its multiple nodes. One node will be
master node and the remaining nodes will ne slave nodes. Each slave node will
send a heartbeat message to the master node, now the master node will detect
the failure if it does not receive the message from the slave nodes through
both the networks. Once the failure is identified, the system will send
notification to the failed node and the node will again send the load of
message to the master node. The failed component has to be recovered using two
recovery schemes – Backward recovery and Forward recovery. The drive holding
the data was crashed due to some issues and the backup of the data has to be
taken for the past days, the log file can be used to rollback to all the
transactions that was completed but lost because of system crash. This process
is Forward recovery. The transactions that are not completed but needs to be
roll backed for usage of those transactions later is backward recovery.

Fault tolerant in the
real world:

System failure:

The
system failure can be either by hardware and software. The software issue could
be some bug that made the system to hang or some bud that crash the system. The
hardware issue could be operating system crash, hard disk failure and so on…
If your application that is running currently suffers some issue in the power
supply and if the application is running on server at that location, then a
separate provider is needed to function the application, but if the system has
fault tolerance and if it hosted in multiple locations, if one goes down then
the other will continue the tasks that are handled by the previous node. So
that the users will not experience any issue while accessing the application.

Security breaches:

A security breach
could be number of issues that causes fault on the system. For instance if the
administrator sets a weak password for one of the software installed and
forgets to change it. So now your system has been compromised by some third
user. If you are running some software in the same hardware, this means that
there is security breach to the customer data. In case of a fault tolerant
system, the third user’s access is limited to just the service offered by the
application. This fault tolerance is not a complete solution for the security