Getting Started With Kafka & Basic Commands

Mentor Konnect
7 min readJul 22, 2021

Kafka is an event Streaming platform. It’s a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. It combines three key capabilities so you can implement your use cases for event streaming end-to-end with a single battle-tested solution:

  1. To publish (write) and subscribe to (read) streams of events, including continuous import/export of your data from other systems.
  2. To store streams of events durably and reliably for as long as you want.
  3. To process streams of events as they occur or retrospectively.
  • And all this functionality is provided in a distributed, highly scalable, elastic, fault-tolerant, and secure manner.
  • Kafka can be deployed on bare-metal hardware, virtual machines, and containers, and on-premises as well as in the cloud.

Kafka is used by thousands of companies including over 60% of the Fortune 100.

Servers:

  • Kafka is run as a cluster of one or more servers that can span multiple datacentres or cloud regions.
  • Some of these servers form the storage layer, called the brokers.
  • Other servers run Kafka Connect to continuously import and export data as event streams to integrate Kafka with your existing systems such as relational databases as well as other Kafka clusters.
  • To let you implement mission-critical use cases, a Kafka cluster is highly scalable and fault-tolerant: if any of its servers fails, the other servers will take over their work to ensure continuous operations without any data loss.

Clients:

  • They allow you to write distributed applications and microservices that read, write, and process streams of events in parallel, at scale, and in a fault-tolerant manner even in the case of network problems or machine failures.
  • Kafka ships with some such clients included, which are augmented by dozens of clients provided by the Kafka community: clients are available for Java and Scala including the higher-level Kafka Streams library, for Go, Python, C/C++, and many other programming languages as well as REST APIs.

All of the tools reviewed in this section are available under the bin/ directory of the Kafka distribution and each tool will print details on all possible command line options if it is run with no arguments.

Main Concepts and Terminology

  • An event records the fact that “something happened” in the world or in your business.
  • When you read or write data to Kafka, you do this in the form of events. Conceptually, an event has a key, value, timestamp, and optional metadata headers. Here’s an example event:
  • Producers are those client applications that publish (write) events to Kafka.
  • Consumer are those clients that subscribe to (read and process) these events.
  • In Kafka, producers and consumers are fully decoupled and agnostic of each other, which is a key design element to achieve the high scalability that Kafka is known for.
  • Events are organized and durably stored in topics.
  • Very simplified, a topic is similar to a folder in a filesystem, and the events are the files in that folder.
  • An example topic name could be “orders”.
  • Topics in Kafka are always multi-producer and multi-subscriber: a topic can have zero, one, or many producers that write events to it, as well as zero, one, or many consumers that subscribe to these events.
  • Events in a topic can be read as often as needed — unlike traditional messaging systems, events are not deleted after consumption. It’s maintained as per the configured retention period.
  • Kafka’s performance is effectively constant with respect to data size, so storing data for a long time is perfectly fine.
  • Topics are partitioned, meaning a topic is spread over a number of "buckets" located on different Kafka brokers.
  • When a new event is published to a topic, it is actually appended to one of the topic’s partitions. Events with the same event key (e.g., a customer or vehicle ID) are written to the same partition.
  • Kafka guarantees that any consumer of a given topic-partition will always read that partition's events in exactly the same order as they were written.

Let’s try setting up Kafka locally and try out some commands.

Pre-Requisite

Install Java 8 or later version in your local system

Setup

$ tar -xzf kafka_2.13-2.7.0.tgz
$ cd kafka_2.13-2.7.0

Starting Kafka

Before starting Kafka, you need to start Zookeeper

# Start the ZooKeeper service
$ bin/zookeeper-server-start.sh config/zookeeper.properties

Open another terminal session and run:

# Start the Kafka broker service
$ bin/kafka-server-start.sh config/server.properties

Once all services have successfully launched, you will have a basic Kafka environment running and ready to use. You will see a few hundred lines of logs on the console.

Let’s try some basic operations

  • Kafka is a distributed event streaming platform that lets you read, write, store, and process events (also called records or messages ) across many machines.
  • Example events are payment transactions, geolocation updates from mobile phones, shipping orders, sensor measurements from IoT devices or medical equipment, and much more.
  • These events are organised and stored in topics. Very simplified, a topic is similar to a folder in a filesystem, and the events are the files in that folder.

So before you can write your first events, you must create a topic. Open another terminal session and run

$ bin/kafka-topics.sh --**create** --topic test-topic --bootstrap-server localhost:9092
  • All of Kafka’s command line tools have additional options:

Run the kafka-topics.sh command without any arguments to display usage information. For example, it can also show you details such as the partition count of the new topic:

$ bin/kafka-topics.sh --**describe** --topic test-topic --bootstrap-server localhost:9092
Topic:test-topic PartitionCount:1 ReplicationFactor:1 Configs:
Topic: test-topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0

Let’s write some events into the topic [ Console Producer ]

  • A Kafka client communicates with the Kafka brokers via the network for writing (or reading) events.
  • Once received, the brokers will store the events in a durable and fault-tolerant manner for as long as you need — even forever.
  • Run the console producer client to write a few events into your topic.
  • By default, each line you enter will result in a separate event being written to the topic.
$ bin/kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092
This is test event 1
This is test event 2

You can stop the producer client with Ctrl-C at any time.

Let’s read the events from the topic [ Console Consumer ]

Open another terminal session and run the console consumer client to read the events you just created:

$ bin/kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092
This is test event 1
This is test event 2

You can stop the consumer client with Ctrl-C at any time.

  • Feel free to experiment: for example, switch back to your producer terminal (previous step) to write additional events, and see how the events immediately show up in your consumer terminal.
  • Because events are durably stored in Kafka, they can be read as many times and by as many consumers as you want.
  • You can easily verify this by opening yet another terminal session and re-running the previous command again.

Adding topics

bin/kafka-topics.sh --bootstrap-server broker_host:port --**create** --**topic** my_topic_name --partitions 20 --replication-factor 3 --config x=y

Add Partitions in a topics

bin/kafka-topics.sh --bootstrap-server broker_host:port --**alter** --**topic** my_topic_name --partitions 40

To add configs:

bin/kafka-configs.sh --bootstrap-server broker_host:port --**entity-type** topics --**entity-name** my_topic_name --**alter** --**add-config** x=y

To remove a config:

bin/kafka-configs.sh --bootstrap-server broker_host:port --**entity-type** topics --**entity-name** my_topic_name --**alter** --**delete-config** x

Deleting a topic:

bin/kafka-topics.sh --bootstrap-server broker_host:port --**delete** --**topic** my_topic_name

List all consumer groups across all topics

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --**list**

View Offsets / Describe Consumer Groups

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --**describe** 
--**group** my-group
**TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID**
topic3 0 241019 395308 154289 host1 consumer2
topic2 1 520678 803288 282610 host2 consumer1

List of all active members in the consumer group

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --**describe** 
--**group** my-group --**members [ --verbose ]**

Reset offsets of a consumer group to the latest offset:

bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 
--**reset-offsets** --**group** consumergroup1 --**topic** topic1 --**to-latest**

The commands covered in this Blog are the most frequently used ones. There are many more Kafka commands that one find in their official documentation.

--

--

Mentor Konnect
Mentor Konnect

Written by Mentor Konnect

Mentor Konnect is a learning platform to upskill one's technical expertise specially in the areas of Big Data, Data science, ML & AI and other niche domains.

No responses yet