This page looks best with JavaScript enabled

Mongodb Vol1 SO Tuning

 ·  🎃 kr0m

This article describes a series of recommendations to consider when deploying a Mongo server on Linux, covering aspects such as the kernel, file system, considerations to take into account with SELinux/GrSec, among others.

These are just basic considerations, of course, each scenario is linked to its own particularities.

Kernel
2.6.36 or later.

File system
XFS or Ext4, preferably XFS, Mongo requires a file system that supports the fsync() call on directories. Some exotic file systems such as HGFS or shared VBox folders do not support it.

Add the atime option to the partition where the DB files are located.

The readahead is beneficial when reading continuous areas of the disk, but mongo does not follow this pattern but rather performs random reads.

blockdev --setra 32 /dev/sda
blockdev --getra /dev/sda
32

We can do it at the OS startup:

vi /etc/local.d/mongo.start
#! /bin/bash
blockdev --setra 32 /dev/sda
chmod 700 /etc/local.d/readahead.start

OS Limits
The recommended parameters by mongo are:

-f (file size): unlimited  
-t (cpu time): unlimited  
-v (virtual memory): unlimited  
-n (open files): 64000  
-m (memory size): unlimited  
-u (processes/threads): 64000
vi /etc/security/limits.conf
mongodb        soft    fsize        unlimited
mongodb        hard   fsize        unlimited
mongodb        soft    cpu        unlimited
mongodb        hard   cpu        unlimited
mongodb        soft    as            unlimited
mongodb        hard   as        unlimited
mongodb        soft    nofile          64000
mongodb        hard   nofile         64000
mongodb        soft    nproc        64000
mongodb        hard   nproc        64000

With this small script we can check if everything is in order:

vi mongoLimitsChecker.sh

#! /bin/bash
for process in $@; do
     process_pids=`ps -C $process -o pid --no-headers | cut -d " " -f 2`

     if [ -z $@ ]; then
        echo "[no $process running]"
     else
        for pid in $process_pids; do
              echo "[$process #$pid -- limits]"
              cat /proc/$pid/limits
        done
     fi
done

NTP
Especially important for sharding clusters.

Transparent Huge Pages
The mapping between virtual and physical memory is carried out by the CPU’s MMU, this process is relatively slow, to mitigate this slowness a mapping cache called TLB was created but the TLB is of limited size so the entries are constantly changing.

As the TLB is limited, what can be done is to make memory pages larger so that we cover more physical memory space with the same TLB entries.

  • Huge pages are appropriate for applications that access large continuous memory regions.
  • It doesn’t work with applications that access small portions of memory at different positions, as is the case with Mongo.

We add disabling huge pages in the startup script:

vi /etc/local.d/mongo.start

#! /bin/bash
blockdev --setra 32 /dev/sda

if [ -d /sys/kernel/mm/transparent_hugepage ]; then
    thp_path=/sys/kernel/mm/transparent_hugepage
elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then
    thp_path=/sys/kernel/mm/redhat_transparent_hugepage
else
    return 0
fi

echo 'never' > ${thp_path}/enabled
echo 'never' > ${thp_path}/defrag

re='^[0-1]+$'
if [[ $(cat ${thp_path}/khugepaged/defrag) =~ $re ]]; then
    # RHEL 7
    echo 0  > ${thp_path}/khugepaged/defrag
else
    # RHEL 6
    echo 'no' > ${thp_path}/khugepaged/defrag
fi

unset re
unset thp_path
chmod 700 /etc/local.d/mongo.start

NUMA

The NUMA architecture was designed to overcome scalability problems in SMP architectures, where all CPUs compete for access to the RAM memory bus, causing a bottleneck. NUMA, on the other hand, proposes a RAM access bus for every X CPUs, with access to this RAM being faster than access to RAM from another group of CPUs.

On paper, NUMA looks very good, but it is known that Mongo does not work correctly in this type of environment. First, we find out if our hardware supports NUMA:

emerge sys-process/numactl
numactl –hardware

available: 1 nodes (0) --> There is only one node, it is NOT NUMA: Intel Xeon CPU E5-1660 v3 @ 3.00GHz
numactl –hardware
available: 2 nodes (0-1) --> It is NUMA: AMD Opteron Processor 4386

To disable it:

echo 0 | tee /proc/sys/vm/zone_reclaim_mode
sysctl -w vm.zone_reclaim_mode=0

We start mongo using numactl:

su mongodb -s /bin/bash
cd
screen -S mongo
numactl –interleave=all /usr/bin/mongod –config /etc/mongodb.conf

SELinux
It is problematic: https://docs.mongodb.com/manual/tutorial/install-mongodb-on-red-hat/#install-rhel-configure-selinux

GrSec
https://jira.mongodb.org/browse/SERVER-12991

V8, the Javascript engine used by mongodb, requires the ability to write executable pages of memory for Just-In-Time (JIT) compilation.
If an operating system has been configured so it’s not possible to write to executable memory regions, V8 cannot function.

If you liked the article, you can treat me to a RedBull here