3-node MongoDB replica set with SystemD and metrics in Telegraf / Grafana

The replica set is a form of data replication, thanks to which data is stored on more than one node, which ensures data durability. We will set the first node as the main node, the second and third as the secondary nodes. Putting replicas is always recommended in odd numbers above 2. Both reasons have their reasons.

What gives you database replication?

High availability: when one node or master node fails, another node can take over.

Increased read performance: Read queries are provided by slave nodes. The more nodes of this type, the greater the read efficiency of the entire cluster.

Greater scalability (horizontal scaling): We are not limited to one machine (vertical scaling) and can scale almost indefinitely.

Higher data durability guarantees: Data is kepts on multiple physical nodes.*

* There are very specific tests that show that under certain conditions, this guarantee is not retained, but it is a very specific test under the stringent requirements, I recommend reading Aphyr's blog

Odd number of nodes

In the replica set, this feature ensures that the majority and hence the consensus in the voting between the nodes in the replica is reached. In the case of an even number, there is a chance that the votes will be evenly distributed and the consensus will not be reached. Each time "voting" takes time to communicate, and in high-performance database systems, delays in the selection of a new main node may have disastrous consequences.

Number of nodes over 2

This requirement is dictated mainly by practical reasons for two reasons. In the event of failure of one of the nodes in a 2-node replica, the whole set becomes very sensitive to the next failure until one of the nodes recovers availability. Having only one set of data can be dangerous from the point of view of the application. In addition, in the case of 2-node replicas and master-slave configurations, the "network partition" phenomenon becomes dangerous when both nodes remain connected to the network, but for some reason they lose this connection with each other. This situation results in that each node believes that it has become the only "healthy" in the replica and that it appoints itself as the main node. At this moment, the applications sending data to the database will cause that data in both databases will stop being compatible (records from one node will not appear in the other one and vice versa). Adding in such configuration of the third node will reduce the probability of occurrence of such a phenomenon that will affect all nodes at once, thus if two nodes stop having communication with each other, there will always be the third one who will confirm this.

In MongoDB there is the creation of arbitration nodes, which are supposed to perform the role of the third or the next odd node. If we are going to put a replica on two expensive machines, there is no need to make the third one just for the purposes of the arbitrator, this instance does not require large resources and a small VPS is enough for its needs.

In the next part of the post I will show how to install and configure MongoDB 4.0.11 from SystemD in Ubuntu Server and connect the Telegraph process which will read the telemetry data of the database processes and send to InfluxDB for monitoring the cluster. The whole will be dashboarded in Grafan.

We start with downloading MongoDB binaries using wget on the server and Telegraf binary:

useradd -m mongodb;
cd /home/mongodb;
wget https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu1804-4.0.11.tgz
tar --strip-components=1 -xvzf mongodb-linux-x86_64-ubuntu1804-4.0.11.tgz

useradd -m telegraf;
cd /home/telegraf;
wget https://dl.influxdata.com/telegraf/releases/telegraf-1.11.3_linux_amd64.tar.gz
tar xf telegraf-1.11.3_linux_amd64.tar.gz

The next step is to add both processes as a SystemD service, create two files which will be placed in /etc/systemd/system:

## /etc/systemd/system/mongodb.service
[Unit]
Description=MongoDB
After=network.target

[Service]
User=mongodb
Group=mongodb
Restart=on-failure
ExecStart=/home/mongodb/bin/mongod --quiet --config /etc/mongodb/mongod.conf

[Install]
WantedBy=multi-user.target

## /etc/systemd/system/telegraf.service
[Unit]
Description=Reporting metrics into InfluxDB
Documentation=https://github.com/influxdata/telegraf
After=network.target

[Service]
ExecStart=/home/telegraf/telegraf/usr/bin/telegraf -config /home/telegraf/telegraf/etc/telegraf/telegraf.conf
Restart=on-failure
KillMode=control-group

[Install]
WantedBy=multi-user.target

We install SystemD services:

systemctl enable mongodb.service
systemctl enable telgraf.service

The final step is the proper configuration of both MongoDB and Telegraph. For the MongoDb configuration, we will use the WiredTiger engine and the basic settings for data compression and indexes. It is important to put processes right in the replica mode. MongoDB in replica mode should be pinned to the interface 0.0.0.0 due to the need to communicate with other machines, more demanding users will probably configure appropriate firewall rules allowing communication on MongoDB ports only from the appropriate IP addresses. In addition, in MongoDB replica mode, it requires that access to the database be subject to authentication.

Below is the MongoDB configuration file:

storage:
    dbPath: "/var/mongodb_data"
    directoryPerDB: true
    journal:
        enabled: true
    engine: "wiredTiger"
    wiredTiger:
        engineConfig: 
            cacheSizeGB: 2
            journalCompressor: zlib
            directoryForIndexes: true
        collectionConfig: 
            blockCompressor: zlib
        indexConfig:
            prefixCompression: true
systemLog:
   destination: file
   path: "/var/log/mongod.log"
   logAppend: true
   logRotate: rename
processManagement:
   fork: false
replication:
   replSetName: "rs0"
net:
   bindIp: 0.0.0.0
   port: 27017
   unixDomainSocket:
       enabled : true
security:
   keyFile: /home/mongodb/mongo.key

Telegraph configuration in the basic scope, i.e. the process sends only the basic metrics of server resources and MongoDB:

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = false
  quiet = false
  logfile = ""
  hostname = ""
  omit_hostname = false
###############################################################################
#                            OUTPUT PLUGINS                                   #
###############################################################################
[[outputs.influxdb]]
  urls = ["XXXXXXXXXXXXXXX"] # InfluxDB address
  database = "xxxxxxx" # database name
  retention_policy = ""
  write_consistency = "any"
  timeout = "5s"
  username = "xxxxxxxx" # user
  password = "xxxxxxxx" # password

###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################
[[inputs.system]]
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs"]
[[inputs.mem]]
[[inputs.diskio]]
[[inputs.net]]

[[inputs.mongodb]]
  servers = ["mongodb://127.0.0.1:27017"]
  gather_perdb_stats = true

The time has come to launch services and verify configurations, we run the services with commands:

service mongodb start
service telegraf start

And then we verify the operation:

service mongodb status
service telegraf status

If everything worked, we should see something like this:

● mongodb.service - MongoDB
   Loaded: loaded (/etc/systemd/system/mongodb.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-06-04 13:10:30 CEST; 2 months 1 days ago
 Main PID: 16935 (mongod)
    Tasks: 84 (limit: 4915)
   CGroup: /system.slice/mongodb.service
           └─16935 /home/mongodb/bin/mongod --quiet --config /home/mongod.conf

Now that we know everything is working, we repeat all the steps on the other two machines. Remember about the correct firewall settings, nothing can block communication on the port ( 27017 in our example) between machines, because MongoDB uses this port to send replication data. Log in to any machine and run mongo to connect to the database. Then execute the command below, where the mongodb-node-X entry is replaced with the machine's IP address:

rs.initiate()  
rs.add('mongodb-node-2:27017')
rs.add('mongodb-node-3:27017')

rs.status()

rs.status() in the Mongo console returns the status of the replica along with details about the operation of each replica node. Congratulations, you have just launched the MongoDB replica! You still have to configure the dashboard in Grafan, which will give you an overview of the entire base.

Similar searches: mongodb replica set / mongo replication / nodejs mongodb replica /