Starting from:

$25

CSYE7245 -Lab 3 Apache Kafka Solved

This lab demonstrates leveraging and implementing Kafka services for static data alongwith  real-timeTwitter streaming.

 

 

Apache Kafka is a streaming message platform. It is a publish-subscribe based durable messaging system. Kafka is designed to be high performance, highly available, and redundant. It is used to collect, process, store, and integrate data at scale. A messaging system sends messages between processes, applications, and servers. 

 

It’s basic use cases includes:

➢    Stream Processing

➢    Messaging

➢    Website Activity Tracking

➢    Log aggregation

➢    Event Sourcing

➢    Application health monitoring

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

These are four main parts in a Kafka system:

●      Broker: Handles all requests from clients (producer, consumer and metadata) and keeps data replicated within the cluster. There can be one or more brokers in a cluster
●      Zookeeper: Keeps track of status of the Kafka clusters (brokers, topics, users)
●      Producer: Sends records to a broker
●      Consumer: Consumes batches of records from the broker
Experiment setup
 

Prerequisites:

 

1.          Installing Oracle Virtual VM Box

 

Specifications: 

●           4 GB RAM

●           25 GB Hard Drive

●           Downloading ubuntu iso file

 

 

 

Oracle VM Virtual Box Manager

 

 

 

Login Page

 

 

 

 

 

 

 

 

Requirements:

 

 

1.     Installing Ubuntu Guest Edition

 

sudo apt install build-essential dkms linux-headers-$(uname -r)

 

➢         Able to copy/paste the contents easily

➢         Full screen mode available

➢         Certain in-built headers/packages available for additional functionalities

 

 

2.     Installing Python

 

Installing the latest version of Python

 

sudo apt install python3

 

sudo apt install python3-pip

 

python3 --version

 

 

 

 

3.     Installing AWS CLI

 

AWS CLI helps to access multiple AWS services and functionalities from the command line.

 

sudo apt install curl

 

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"

 

unzip awscliv2.zip

 

sudo ./aws/install

 

/usr/local/bin/aws --version

 

 

 

 

4.     Connecting with AWS

 

Connecting the server with AWS account by entering the Access and Secret keys

 

aws configure 

 

aws s3 ls

 

 

 

 

 

5.      Installing Java jdk

 

Java jdk is required for starting the Kafka broker and services

 

sudo apt update

 

sudo apt list

 

sudo apt install default-jre

 

sudo apt install default-jdk

 

javac --version

 

 

 

6.     Installing Pycharm in Ubuntu

 

 

Test Results
 

1.     Installing Kafka

 

Download Apache Kafka from here 

 

Unzip Kafka binaries by using  tar -xzvf 

 

pip3 install kafka-python

 

 

2.     Starting the  Zookeeper service and Kafka broker

 

Navigate to the directory where the downloaded files are unzipped and start the Zookeeper service

 

bin/zookeeper-server-start.sh config/zookeeper.properties

 

 

 

 

 

Start the Kafka broker in a new terminal

 

bin/kafka-server-start.sh config/server.properties

 

 

 

 

 

 

 

 

Use Cases
 

   Collecting real time sampled tweets from Twitter and publishing them to our Kafka Broker

 

 

1.     producer.py

 

  Running the script producer.py for generating events

 

 

 

2.     consumer.py

 

    Running the script consumer.py to consume the events published by the producer.

 

 

 

 

 

 

 

 

 

 

 

 

3.     twitter-stream.py

 

Using the twitter-stream.py script to  fetch tweets from Twitter's API in real-time.

 

 

 

 

 

 

 

Entering our bearer token in the twitter.py script under the BEARER_TOKEN parameter.

 

Tweets are published to the Kafka Broker.

 

 

On running consumer.py again, we can see all the published events that are collected by the consumer.

 

 

 

 

Lessons learned
 

Learnt configuration of Oracle Virtual Box with Ubuntu operating system
Learnt the basic fundamentals of Apache Kafka
Implemented real-time data streaming using Twitter API in Apache Kafka

More products