DRBD-Cookbook

Text
Read preview
Mark as finished
How to read the book after purchase
Font:Smaller АаLarger Aa

DRBD-Cookbook

How to create your own cluster solution, without SAN or NAS!




Joerg Christian Seubert



1. Edition




Contents



1

Introduction

  1.1

Syntax of that book

  1.2

Built-in bugs

  1.3

Hostnames

  2

Installation

  2.1

Software

  2.2

Requirements

  3

Preliminary considerations

  3.1

Disk drive – physically vs. LVM

  3.2

Filesystem on the disk device

  3.2.1

UFS / ext2

  3.2.2

ext3 / ext4

  3.2.3

xfs

  3.2.4

BtrFS

  3.2.5

OCFS2

  3.2.6

Conclusion

  4

Configuration-Basics on a two-node-cluster-array

  4.1

SSH-Configuration

  4.1.1

SSH-Key-Types

  4.1.2

Recipe: Generate and distribute SSH keys

  4.2

Configuration of the locale Disk devices

  4.2.1

Recipe: Create LVM volume via YaST

  4.2.2

Recipe: Create a LVM-Volume via shell

  4.3

DRBD configuration

  4.3.1

Performance

  4.3.2

/etc/drbd.conf

  4.3.3

/etc/drbd.d/global_common.conf

  4.3.4

Resourceconfiguration

  4.3.5

Resource Description:

  4.3.6

Alternative notation

  4.3.7

Ports

  4.3.8

Recipe: Command sequence for the basic configuration

  5

Data transmission in the backbone LAN

  5.1

Recipe: Backbone LAN in the DRBD configuration

  6

Multi-node cluster

  6.1

Stacking-Device

  6.1.1

Recipe: Commissioning a stacking device with DRBD8 under SLES11 SP4

  6.1.2

Recipe: Commissioning a stacking device with DRBD9 under OpenSuSE 15.1

  6.2

RAID1 over at least three nodes

  7

Hardened cluster

  7.1

Detecting the state of the firewall

  7.2

Firewallzones

  7.3

Build a new firewall-service

  7.4

Build a new zone

  7.5

Basics of a two-stage hardening concept

  7.5.1

Hardening with the firewall

  7.5.2

Hardening with the Secure-Shell-Options

  8

Increase / decrease the size of the DRBD device

  8.1

Unofficial method

  8.1.1

Concept

  8.1.2

Command sequence

  8.2

The official way

  9

Program your own cluster solution

  9.1

Configuration file

  9.1.1

Content of the configuration file

  9.2

Virtual IP-address

  9.2.1

Script explanation for my_virt_ip.pl:

  9.3

Make database switchable

  9.3.1

Script explanation for my_dev_switch.pl:

  9.4

Communication between cluster nodes - the Horcher

  9.4.1

Script explanation for my_horcher.pl:

  9.5

Controlscript

  9.5.1

Script explanation for my_control.pl:

  9.6

Service control scripts for systemd

  9.6.1

Explanation for mycluster_horcher.service:

  9.6.2

Explanation for mycluster_control.service:

  9.7

Initialization script for the cluster controller

  9.7.1

Explanation for my_service.pl:

  9.8

Maintenance

  9.9

General information about the scripts

  10

Include DRBD in Veritas Cluster

  10.1

Include DRBD as Veritas-Agent

  10.2

Increase or decrease a DRBD-Device, which is included in the Veritas-Cluster

  11

DRBD and Docker

  11.1

Preparation and first start of the container

  11.2

Work with the container

  12

Win-DRBD

  13

SSH-Configuration on SLE 15 / OpenSuSE Leap 15.x

  14

Creating a LVM-Volume-Group

  14.1

YaST

  14.2

Shell

  15

Stop, start, enable and disable services - a little tutorial

  15.1

Copying servicefiles

  15.2

Use services via YaST

  15.3

Operate services with systemctl

  16

Sources and disclaimer

  16.1

Sources

  16.1.1

Internet

  16.1.2

Books

  16.2

Disclaimer:

  17

About…

  17.1

…the book…

  17.2

…the author…

  17.3

Legal notice:






1 Introduction



If you want to build a cluster, sooner or later you face the problem that the data must be usable on all participating servers. This problem can be solved by transporting the data once per minute from the active cluster node to the passive cluster node.

But what, if this "copy job" takes longer than one minute?

In this case, you either have the situation that the copy jobs overtake each other and never end, because the cluster node in question does nothing else but ’copy’ - and ’nothing else’ means ’nothing else’ - or the data is outdated every time.

Neither makes sense and is not desirable.

If you have the additional situation that all cluster nodes must not only read the data but also write it, ’practical copying’ no longer makes any sense at all. Usually, this problem is solved by using a SAN or NAS.

For a data center, where there are usually more than two machines running at 24 * 7 uptime, it may not be a problem to run one more machine per cluster group - this can be a ’disk pot’, known as a true SAN (storage area network), or it can be a network file server, known as a NAS (network-attached-storage).

However, small businesses and home users face the problem of having to pay for a SAN or NAS.

That’s where the DRBD - Distributed Replicated Block Device - product from LinBit (

www.linbit.com

) comes in. DRBD gives you the ability to connect two or more cluster nodes together without using a SAN or NAS as a data device. DRBD runs like a local RAID controller creating a mirror device (RAID 1) - but with "local disks" connected by a LAN. You can also use this variant in a large data center if your cluster needs to be independent of a SAN or NAS. For example, you can think of a monitoring server that monitors the SAN or NAS and has to run highly available, especially when the SAN or NAS is not running. This cookbook teaches the basics of a DRBD active-passive cluster, extended by further possibilities (three-node cluster, backbone LAN, deployment of DRBD on a Veritas cluster, creation of an own cluster via PERL, cluster configuration via hardware systems and many more) and demonstrates the procedures in the form of ’listings’. All examples are based on a test configuration with OpenSuSE Leap 15.1 (except

6.1.1

) and can - with the necessary background knowledge - also be implemented in other Linux distributions. In the text number

6.1.1

 the listing is done with SLES 11 SP 4 to show the commands and screen outputs of DRBD version 8 compared to DRBD version 9 because there are some differences. For using DRBD on Windows-Servers use chapter

12

.

 






1.1 Syntax of that book



To distinguish keyboard inputs and on-screen outputs from the explanations, the commands and on-screen outputs are displayed as follows:








            Listing 1.1:





            example of a session








 hostname:~ # echo "This is an example!"

This is an example!



In the scripts, the individual lines are numbered consecutively and the individual lines are briefly explained in tabular form in the text following the respective listing.

This means that the commands of the "recipes" can be entered on the shell as shown in the examples. The screen output should also be, as shown. The disclaimer (

16.2

) is explicitly pointed out here, because your systems do not have to match my systems.






1.2 Built-in bugs



In the course of creating this book, I made various mistakes while working out the recipes, which, after careful consideration, I simply took over into the recipe.

The reason for this is that these mistakes can also happen to you, during operation.

In the context of the respective cooking recipe, I then corrected these mistakes again - also to show you how to save the situation, and which factors - not clearly visible at first - had an influence on the respective error situation.

In this way, you can learn from my mistakes to avoid or solve similar mistakes in your systems.






1.3 Hostnames



In an old Siemens-Nixdorf-UNIX-manual, the configuration was explained using hostnamens like Jupiter and Saturn.

Because the dwarf planet pair Pluto and Charon (Charon is the greatest companion of the dwarf planet Pluto) have their common center of gravity, around which they circle, outside of their respective counterparts, these names seemed to me to be suitable to represent a cluster function. Consequently, the second largest moon of Pluto, Nix forms the third host in the three-cluster-node array.







2 Installation

2.1 Software



The DRBD software is provided for the Server or Enterprise editions starting with the following Linux distributions and is updated accordingly (as of summer 2020):





 Red Hat Enterprise Linux (RHEL), versions 6, 7 and 8



 SUSE Linux Enterprise Server (SLES), versions 11 SP4, 12 and 15



 Debian GNU/Linux, 8 (jessie), and 9 (stretch)



 Ubuntu Server Edition LTS 14.04 (Trusty Tahr), LTS 16.04 (Xenial Xerus), and LTS 18.04 (Bionic Beaver)





In addition, OpenSuSE provides the DRBD packages from version Leap 42.1.

When using the command zypper, it looks like this (the output lines habe been shortened because the type is package in all cases):








            Listing 2.1:





            zypper search drbd








pluto:~ # zypper search drbd

Loading repository data...

Reading installed packages...

S | Name | Summary |

--+---------------------------+------------------------------------------------------------+-

 | drbd | Linux driver for the "Distributed␣Replicated␣Block␣Device" |

 | drbd-formula | DRBD deployment salt formula |

 | drbd-kmp-default | Kernel driver |

 | drbd-kmp-preempt | Kernel driver |

 | drbd-utils | Distributed Replicated Block Device |

 | drbdmanage | DRBD distributed resource management utility |

 | yast2-drbd | YaST2 - DRBD Configuration |

pluto:~ #






2.2 Requirements



"The system have to run!“



Admittedly, this sentence sounds rather stupid with regard to the minimum requirements of an installation - especially in a technical book. However, the fact is that the DRBD software does not have any special minimum requirements for the equipment of the cluster nodes, which is due to the fact that the DRBD function is integrated into the Linux kernel.

Equip your cluster nodes to meet your requirements and ensure that the high availability application runs properly on the deployed platform. With regard to synchronization, there are a few more notes.

To create this book, I installed two virtual machines on a laptop that had the "fabulous" memory size of 4 GB and a quad processor running at 2.16 GHz.

This might be enough for a workstation, laptop or desktop, but for a server or host this computer is a bit tight.

The two “VMs” on this laptop each have 1 CPU and 1 GB RAM.

In our case, the LAN connection is established by a single LAN adapter with a speed of 10 Mb/s - sufficient for home use, for a server …well ….

As I said, for a server environment that has a little more work to do than present "It works" via apache, this hardware configuration would be considered lean. But to show that it basically works, this configuration is still sufficient.

Depending on the size of the disk partitions you want to include in this RAID 1, you should consider setting up a separate LAN for disk synchronization (backend LAN). However, you should be careful about the speed of synchronization, otherwise your computers will be busy with disk synchronization only. But more about that later.

I also don’t want to write a novel about the minimum configuration of hosts here, others have done that before me. I also realize that some users consider 10 Mb/s to be clearly too slow for their home network.

I want to show that DRBD works even with absolutely minimal equipment. Which brings us back to cost savings, especially for small businesses.







3 Preliminary considerations



Before we take a closer look at the basic configuration of a two-node cluster, there are some basic considerations. If you have already made your selection or have special requirements, you can safely skip this chapter - but you do so at your own risk.

I myself am not one for reading through endless introductions, and I know colleagues who read the introductions very carefully and then didn’t know what to do when it came time to implement them.

The important thing for me in the preliminary considerations is to avoid unnecessary work, so that you don’t have any downtime at the end of a test run or even in a running, productive cluster.

And nothing is more deadly to a cluster than not being available.

That’s why the old do-it-yourself motto applies here, too:





Measure first, then cut!






3.1 Disk drive – physically vs. LVM



So, let’s first take a look at how the disk drive should be "designed".

Let’s first take a "physical disk device", i.e. an additional disk partition, in addition to the "classical" partitions like swap, root (/) and /home.

This solution has the advantage that there is no additional "virtualization layer" holding up processing operations, which could lead to performance degradation in case of doubt.

The disadvantage is that a subsequent increase or decrease in size, can only be carried out with increased effort if hardware actually has to be replaced.

Using Logical Volume Manager, or LVM for short, gains more headroom, but adds a virtualization layer on very tight systems, which can lead to the aforementioned performance degradation.

Both types of disks work with DRBD!

In the systems I have set up, I generally use Logical Volume Manager because the advantage of adding disks after the fact outweighs the disadvantage of performance degradation.






3.2 Filesystem on the disk device



In principle, a DRBD could also be used as a RAW device. Whether and which file system "runs" on the DRBD device does not really matter. Nevertheless, I would like to take a closer look at the different working methods of the file systems used to help you decide. All file systems have their specific advantages and disadvantages based on the way they work. For perhaps understandable reasons, I won’t go into more detail about tree structures or the like at this point. If you are interested in these specific points, you should consult the relevant technical literature or

www.wikipedia.com

.






3.2.1 UFS / ext2



The good old ’UNIX File System’ - because that’s what UFS stands for - was developed in the early 1980s and was the standard file system for all UNIX derivatives until the early 1990s. Today, however, it is only used in isolated cases.

However, the basic concept was passed on to the following file system generations:





 all data is stored in blocks on the hard drive and



 to get to a data block, the address of the memory block is stored in an area called "superblock", which is accessed first by the operating system.





In this way a tree structure is obtained, because each stored file is assigned a specific "inode number".

If a search is made for a specific file within the file system, the entire file tree must always be searched, which can take a comparatively long time for larger file trees with many substructures.

The "second extended file system" (ext2) essentially adopts this structure, but so-called "plugins" - i.e. extensions - can be added to handle fragmentation, compression and recovery of deleted data.






3.2.2 ext3 / ext4



The ext3 and ext4 file systems have evolved from the ext2 with the addition of a so-called journal and the ability to change the size of the file system while the file system is in use.

In a journaling file system, all changes are recorded in a special memory area called journal before the actual write to the selected block takes place. This makes it easier to reconstruct the writes if, for example, the system crashes or the power goes out during the write operation.

Another point of improvement of ext3 resp. ext4 over ext2 was the increase in file system partitions from 16 TB to 32 TB for ext3 and 1 EB (= exabyte) for ext4. Such device sizes could not have been imagined when the UFS was developed.

In addition, there are the extensions regarding the number of files and directories as well as the size of the individual files, which was still limited to 2 TB for ext2, could be between 16 GB and 2 TB for ext3, and is finally only limited by the size of the disk partition for ext4.






3.2.3 xfs



The file system xfs, originally developed by Silicon Graphics (SGI) exclusively for the in-house UNIX system "IRIX", is one of the oldest file systems. But just because something is getting on in years doesn’t mean it has to be "bad". It sets standards with maximum values of 16 EB per file system, a maximum number of 263 files and a size per file of 8 EB.

 

It also has significant advantages over ext3 and BtrFS, especially in terms of speed.

Some time ago, I had a case where about 100 GB needed to be copied from one host to another. The source filesystem was a BtrFS and the copy ran - to save LAN resources - over a TAR that was compressed on the source machine, pushed through an SSH tunnel and decompressed again on the target machine.

This work took a little over an hour - probably because the file system had many subdirectories.

After the work on the source file system was finished, among other things it was enlarged to 200 GB, I spontaneously decided to use xfs as the new file system.

The recovery time was 20 minutes!

Needless to say, I have been a