Rebalance Your Kafka Cluster Using Cruise Control

Cluster Skewness is the major problem in today’s date. uneven Kafka partition load on broker node or unbalanced cluster can generate unnecessary disk, CPU problems, or even the need to add another broker to handle unexpected traffic.

To solve this real-time problem, Linked-in came up with a solution called Linkedin Cruise-control. It is an open-sourced Kafka Tool that helps automate and manage Kafka clusters, up to a certain level. Added a REST interface to help with the administration of Kafka clusters remotely, a centralised dashboard to operate and check the status of any Kafka cluster at LinkedIn. Below is the diagram where cruise-control sits in the architecture.

Photo from https://softwareengineeringdaily.com/2020/02/20/linkedin-kafka/

Cruise Control features include:

  1. Kafka broker resource utilization tracking
  2. The ability to query the latest replica state (offline, URP, out of sync) from brokers
  3. Goal-based resource distribution
  4. Anomaly detection with self-healing
  5. Admin operations on Kafka (add/remove/demote brokers, rebalance cluster, run PLE)

For more details, you can refer to [https://github.com/linkedin/cruise-control]. Bellow’s example shows how cruise control redistributed the load if a broker is under-utilized.

Photo: Example of partition reassignment

Cruise-control Deployment

What we are trying to achieve lets see with a flow chart

Photo: Cruise-control with Kafka cluster and Prometheus

Dependencies needed

  1. Kafka Cluster: Fully operation cluster with prometheus node and jmx expoter enabled.
  2. Prometheus server: Prometheus server should be running and the target file must be configured to monitor the Kafka broker nodes

Different ways we tried deploying cruise-control were

  1. As a tool on AWS EC2
  2. As docker image

Deploying on AWS EC2

  1. Create a ec2 instance ( taking ubuntu as instance type)
  2. Install git :
sudo yum install -y git 

3. Install Java and set the Home paths: sudo yum install java-1.8.0

sudo yum install java-1.8.0-openjdk-devel

4. clone cruise control repo:

git clone https://github.com/linkedin/cruise-control.git or wget “https://github.com/linkedin/cruise-control/archive/refs/tags/2.5.42.zip"

5. cd in cruise-control

run ./gradlew jar

6. Edit only the option mentioned bellow in cruisecontrol.property file for simple connection.

# If using TLS encryption, use 9094; use 9092 if using plaintext bootstrap.servers= <Bootstrap servers endpoint string from kafka>
zookeeper.connect= <zookeeper endpoint string from kafka>
# Use the Prometheus Metric Sampler metric.sampler.class=com.linkedin.kafka.cruisecontrol.monitor.sampling.prometheus.PrometheusMetricSampler
# Prometheus Metric Sampler specific configuration prometheus.server.endpoint=1.2.3.4:9090 # Replace with your Prometheus IP and port
# Change the capacity config file and specify its path; details below
capacity.config.file=config/capacityJBOD.json

Also make changes in the capacityCore.json with DISK, CPU,NW_IN,NW_OUT of the actual value of the broker node.

{ 
“brokerCapacities”:
[
{
“brokerId”: “-1”,
“capacity”: {
“DISK”: “10000”,
“CPU”: { “num.cores”: “2” },
“NW_IN”: “5000000”,
“NW_OUT”: “5000000”
}, “doc”: “This is the default capacity. Capacity unit used for disk is in MB, cpu is in number of cores, network throughput is in KB.”
}
]
}

7 Setting Up Cruise Control Frontend.

wget https://github.com/linkedin/cruise-control-ui/releases/download/v0.3.4/cruise-control-ui-0.3.4.tar.gztar -xvzf cruise-control-ui-0.3.4.tar.gz
mv cruise-control-ui ~/cruise-control-2.4.25/

8. Security group for the AWS EC2 instance where cruise-control is deployed must be allowed for

•Kafka broker server
•Prometheus server
•Zookeeper
•Your Ip for SSH into the host and for UI access

9. To access the UI:

http://<instance dns>:9091

Deploying as Docker image

The docker image is created out of the steps mentioned above and is pushed into dockerhub

  1. You can pull docker image using below command:
docker pull 11nehas/cruise-control:latest

2. You need to provide your config files (ex: capacityCores.json and cruisecontrol.properties). Create a directory
mkdir ~/cc
mkdir ~/cc/config

3. Download the config files from https://github.com/linkedin/cruise-control/tree/master/config.

4. Make sure the Prometheus server is running. If not you can use the Prometheus docker image and edit the Prometheus config file with Kafka broker node targets.
mkdir ~/prometheus

create a file ~/prometheus/prometheus.yml and ~/prometheus/targets.json with proper value . ( for more details check Prometheus monitoring for Kafka)

5. Run Prometheus docker image and then cruise-control docker image:

sudo docker run -d -p 9090:9090 — name=prometheus -v $PWD/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml -v $PWD/prometheus/targets.json:/etc/prometheus/targets.json prom/prometheus — config.file=/etc/prometheus/prometheus.ymldocker run -p 9091:9091  --link prometheus:prometheus -v $PWD/cc/config/cruisecontrol.properties:/cc/config/cruisecontrol.properties -v $PWDc/cc/config/capacityCores.json:/cc/config/capacityCores.json 11nehas/cruise-control:latest

web ui config location
webserver.ui.diskpath=./cruise-control-ui

Add /cruise-control-ui/ui-config.csv if you want to adjust web ui config.

6. Security group remains the same. Access it using http://<instance dns>:9091. You will see the homescreen

Cruise-control with SASL/SCRAM Authentication

If you have Enabled the Kafka Authentication with SASL/SCRAM you need to add the below option to the cruisecontrol.properties files:

security.protocol=SASL_SSL
sasl.mechanism=SCRAM-SHA-512
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \ username=’’ \ password=’’;

Software Engineer by profession and artist by weekend