Backing up and restoring a Cassandra cluster using Medusa
If you are running a Cassandra cluster on Linux for the DataMiner system storage, you can use Medusa to back up and restore your data.
Medusa serves as an Apache Cassandra backup system, offering a command-line interface for backing up or restoring either a single Cassandra node or an entire cluster. Its functionality extends to supporting various storage options, including local storage as detailed below.
When initiating a backup with Medusa, the nodetool status command ensures the operational status of all nodes in the Cassandra cluster. This prerequisite guarantees data integrity and consistency during the backup process.
Note
- We promote the use of Ubuntu LTS as the preferred Linux distribution. As such, the commands mentioned below will work on any Debian-based system, including Ubuntu.
- Medusa also supports cloud services such as Google Cloud Storage (GCS), Azure Blob Storage, and AWS S3. For detailed instructions and guidance on utilizing these storage providers, please consult the Cassandra Medusa documentation.
Configuring the firewall and generating SSH keys
The backup is configured on one of the nodes in the cluster. That node will then connect to the other nodes when a backup is performed. Therefore, it is crucial that all nodes within the cluster can communicate through the SSH port (default port 22).
See below for detailed instructions on how to enable SSH access.
Important
This documentation assumes that you have activated the firewall, and that port 22 has been opened following the recommendation found in Installing Cassandra on a Linux machine. If the firewall is disabled, please refer to step 2 of Installing Cassandra on a Linux machine first, and ensure that the firewall permits traffic on ports 7000, 7001 and 9042 before proceeding with the backup configuration.
Configure each node to allow access to port 22 from each node in the cluster.
Example of commands on node 1:
$ sudo ufw allow from [IP node 2] to [IP node 1] proto tcp port 22
$ sudo ufw allow from [IP node 3] to [IP node 1] proto tcp port 22
Example of commands on node 2:
$ sudo ufw allow from [IP node 1] to [IP node 2] proto tcp port 22
$ sudo ufw allow from [IP node 3] to [IP node 2] proto tcp port 22
Example of commands on node 3:
$ sudo ufw allow from [IP node 1] to [IP node 3] proto tcp port 22
$ sudo ufw allow from [IP node 2] to [IP node 3] proto tcp port 22
Generate SSH keys.
You will need to generate SSH keys in PEM format, and add the path of the private key to the SSH section in the medusa.ini configuration file.
On the node where the backup will run (Node 1 in this example), generate a 4096-bit RSA key pair using the following command:
$ ssh-keygen -t rsa -b 4096 -m PEM -f <file_name>
Example:
$ ssh-keygen -t rsa -b 4096 -m PEM -f id_rsa
The command above creates a private key (id_rsa) and its corresponding public key (id_rsa.pub) in PEM format within the home folder.
Copy the public key to all nodes by running the following command:
$ scp id_rsa.pub username@<Node1_IP>:/home/<username>/
Example in which the keys are copied to Node 2:
$ scp id_rsa.pub myUser@10.10.10.12:/home/myUser/
Example in which the keys are copied to Node 3:
$ scp id_rsa.pub myUser@10.10.10.13:/home/myUser/
Write the public key to the authorized_keys on all nodes in the cluster by running the following command:
$ cat [Path to file]/<file_name>.pub >>~/.ssh/authorized_keys
Example:
$ cat /home/myUser/id_rsa.pub >>~/.ssh/authorized_keys
Important
If the backup is initiated from one of the nodes in the cluster, the public key should also be appended to the authorized_keys file on this node.
After the last step, you should be able to connect via SSH from one of the nodes to another node of the cluster, without entering the password. Just the IP address should be enough.
Configuring the NFS share
To facilitate storage for backups, a shared folder is necessary, and all nodes in the cluster must be mounted to the same network path.
Set up an NFS share by following the instructions in How to set up an NFS mount on Ubuntu 20.04 or Installing and configuring Network File System (NFS) on Ubuntu.
Make a note of the path, as you will need to store it in the base_path property in the Medusa.ini configuration file.
Note
If you opt for a local path rather than a network share, only the backups of the local node will be accessible. We strongly recommend utilizing a shared folder for improved visibility and centralized access to backups across all nodes in the cluster.
Installing and configuring Medusa
Execute the following steps on each node in the cluster:
Install python3-pip by running the following commands:
$ sudo apt update
$ sudo apt install python3-pip
Install Medusa. For detailed instructions, see the installation guide on GitHub.
Create the /etc/medusa directory if it does not exist yet:
$ sudo mkdir -p /etc/medusa/
Copy the example provided in Configure Medusa on GitHub into a new file /etc/medusa/medusa.ini.
Edit the file to ensure the following properties are configured:
Cassandra:
config_file
CQL credentials
- cql_username
- cql_password
nodetool credentials:
- nodetool_username
- nodetool_password
User certificate (if TLS encryption is configured in Cassandra):
- certfile (path to the rootCa certificate)
- usercert (path to user certificate)
Storage:
- storage_provider
- bucket_name
- base_path
Taking a backup using Medusa
-
Choose whether to take a single node backup or a cluster backup.
To take a full backup of a single node, run the following command:
$ medusa backup --backup-name=<name of the backup> --mode=full
To take a full backup of a cluster, run the following command:
$ medusa backup-cluster --backup-name=<name of the backup> --mode=full
Verify that the backup is taken for every node in the cluster. The location of the backup is base_path/bucket_name.
Important
If you take a backup of a cluster and this fails for some reason, take separate backups of the single nodes instead (by connecting locally to each of them).
Restoring a backup using Medusa
You can restore a full cluster or a single node.
To restore a full cluster, see Restoring a full cluster
To restore a single node, see Restoring a single node