Introduction
This project focuses on sending and reading messages with Amazon Kinesis Data Streams. Kinesis Data Streams is a fully managed service that helps stream large amounts of data in real time. Once data is ingested, various real-time processing applications or services can consume it.
Why is it useful?
Prerequisites
Before starting, ensure we have:
AWS Account (Free Tier usage is sufficient; no credits required if usage remains within Free Tier limits).
AWS CLI Installed (Command-Line Interface tools to automate tasks).
Permissions in the AWS Account:
Ability to create and manage Kinesis streams (e.g., AmazonKinesisFullAccess or admin privileges).
Permission to view and create IAM roles/policies (if needed).
Kinesis Data Streams API Enabled in the AWS account (it is typically enabled by default).
Step-by-Step Implementation
We will provide two approaches: Manual Steps (GUI) using the AWS Management Console, and Command-Line Interface (CLI) steps. Both lead to the same result.
Manual Steps (Graphical User Interface - GUI)
Sign in to the AWS Management Console
Go to https://aws.amazon.com/console/ and sign in.
Command/Action Explanation: No CLI command here; we are using the console to access AWS.
Navigate to Amazon Kinesis
In the “Services” dropdown, look for “Analytics” or type Kinesis in the search bar and select “Kinesis”.
Command/Action Explanation: We are locating the Kinesis service to create/manage streams.
Create a Kinesis Data Stream
Click “Create data stream”.
Provide a Stream name (e.g., my-sample-stream).
Set the Number of open shards (e.g., 1 for a basic setup).
Click “Create data stream”.
Command/Action Explanation: We are creating a new Kinesis Data Stream with the specified shard count. One shard is enough for small-scale testing and falls within the Free Tier if usage is minimal.
Send a Test Message (PutRecord) via Console
After the stream is active (status shows “Active”), select the stream to open its details page.
Look for a “Put data” or “Send data” section in the console (the UI may vary slightly over time).
Input a Partition key (e.g., partitionKey123) and a Data field (e.g., HelloKinesis).
Click “Send data”.
Command/Action Explanation: We are using the console’s built-in test feature to send a sample record to the stream.
Set Up a Simple Consumer to Read Messages
Within the Kinesis stream details, you may see options to create a consumer application (e.g., a Kinesis Data Analytics application or another AWS service). For a basic test, we can later verify messages by using the CLI. (See CLI section for reading data.)
Note: The AWS console often doesn’t provide a direct “read messages” feature for Kinesis Data Streams. Typically, we use the CLI, a Kinesis Client Library (KCL) application, or Kinesis Data Analytics to consume the stream.
B. Command-Line Interface (CLI) Steps
Below are the equivalent steps using AWS CLI. Each command is explained for clarity:
Configure AWS CLI
Create a Kinesis Data Stream
Check Stream Status
Put a Record into the Stream
Step A: Get Shard Iterator
Step B: Get Records using the Shard Iterator
If our test message was recently put, we should see it in the output, possibly in Base64-encoded form.
Verifying and Testing the Project
Common Issues and Troubleshooting
Stream Not Active:
If the stream status is CREATING for too long, refresh the console or wait a few minutes. Ensure you have the correct permissions.
Insufficient Permissions:
Verify your IAM policy allows managing Kinesis. If not, attach the necessary policy to your user or role.
Empty Records on get-records:
Make sure you used the correct shard ID. Also, if the data is older than the retention period, it may have expired.
AWS CLI Misconfiguration:
If commands fail, run aws configure again and confirm the correct region (the same region in which you created the stream).
Data Encoding:
The data might appear Base64-encoded in the CLI output. This is normal; decode if needed for readability.
Conclusion
We have successfully created a Kinesis Data Stream, sent a message to the stream, and retrieved that message, all while using AWS’s Free Tier. We learned how to configure both the AWS Management Console and the CLI to manage streams, send data, and read data in near real-time. These steps form a foundational understanding of how AWS handles streaming data, providing a scalable and robust environment for real-time data processing.
Popular Projects
What is Cloud Computing ?
Cloud computing delivers computing resources (servers, storage, databases, networking, and software) over the internet, allowing businesses to scale and pay only for what they use, eliminating the need for physical infrastructure.