Send and read a message in a Kinesis stream

Introduction

This project focuses on sending and reading messages with Amazon Kinesis Data Streams. Kinesis Data Streams is a fully managed service that helps stream large amounts of data in real time. Once data is ingested, various real-time processing applications or services can consume it.

Why is it useful?

  • Real-Time Data Processing: Ideal for streaming logs, IoT data, or clickstreams.
  • Scalable: Automatically scales to handle varying amounts of data.
  • Low Maintenance: AWS manages availability and fault tolerance, so we can focus on building our applications rather than managing servers.

Prerequisites

Before starting, ensure we have:

AWS Account (Free Tier usage is sufficient; no credits required if usage remains within Free Tier limits).

AWS CLI Installed (Command-Line Interface tools to automate tasks).

Permissions in the AWS Account:
Ability to create and manage Kinesis streams (e.g., AmazonKinesisFullAccess or admin privileges).
Permission to view and create IAM roles/policies (if needed).

Kinesis Data Streams API Enabled in the AWS account (it is typically enabled by default).

Step-by-Step Implementation

We will provide two approaches: Manual Steps (GUI) using the AWS Management Console, and Command-Line Interface (CLI) steps. Both lead to the same result.

Manual Steps (Graphical User Interface - GUI)

Sign in to the AWS Management Console
Go to https://aws.amazon.com/console/ and sign in.
Command/Action Explanation: No CLI command here; we are using the console to access AWS.

Navigate to Amazon Kinesis
In the “Services” dropdown, look for “Analytics” or type Kinesis in the search bar and select “Kinesis”.
Command/Action Explanation: We are locating the Kinesis service to create/manage streams.

Create a Kinesis Data Stream
Click “Create data stream”.
Provide a Stream name (e.g., my-sample-stream).
Set the Number of open shards (e.g., 1 for a basic setup).
Click “Create data stream”.
Command/Action Explanation: We are creating a new Kinesis Data Stream with the specified shard count. One shard is enough for small-scale testing and falls within the Free Tier if usage is minimal.

Send a Test Message (PutRecord) via Console
After the stream is active (status shows “Active”), select the stream to open its details page.
Look for a “Put data” or “Send data” section in the console (the UI may vary slightly over time).
Input a Partition key (e.g., partitionKey123) and a Data field (e.g., HelloKinesis).
Click “Send data”.
Command/Action Explanation: We are using the console’s built-in test feature to send a sample record to the stream.

Set Up a Simple Consumer to Read Messages
Within the Kinesis stream details, you may see options to create a consumer application (e.g., a Kinesis Data Analytics application or another AWS service). For a basic test, we can later verify messages by using the CLI. (See CLI section for reading data.)

Note: The AWS console often doesn’t provide a direct “read messages” feature for Kinesis Data Streams. Typically, we use the CLI, a Kinesis Client Library (KCL) application, or Kinesis Data Analytics to consume the stream.

B. Command-Line Interface (CLI) Steps

Below are the equivalent steps using AWS CLI. Each command is explained for clarity:

Configure AWS CLI

  • Explanation: Prompts for AWS Access Key, Secret Access Key, Region, and Output format. This ensures your CLI commands run under your AWS credentials.


Create a Kinesis Data Stream

  • Explanation:
    create-stream: Creates a new Kinesis Data Stream.
    --stream-name: The name of the stream (we chose my-sample-stream).
    --shard-count 1: Specifies the number of shards (1 is enough for a basic test).


Check Stream Status

  • Explanation:
    describe-stream: Describes the status and details of the specified stream.
    We want to see if StreamStatus is ACTIVE before sending data.


Put a Record into the Stream

  • Explanation:
    put-record: Sends a data record to a Kinesis stream.
    --partition-key: Helps determine which shard the data is stored on.
    --data: The actual message content (in this case “HelloKinesis”). AWS automatically encodes it in Base64.
  • Read from the Stream
    To read records, we first need the shard iterator. We can do this in two steps:


Step A: Get Shard Iterator

  • Explanation:
    get-shard-iterator: Retrieves a pointer (iterator) to read data from the specified shard.
    --shard-id: Usually shardId-000000000000 for a single shard.
    --shard-iterator-type TRIM_HORIZON: Starts reading from the earliest data in the shard.
    We capture the ShardIterator in a variable named SHARD_ITERATOR.


Step B: Get Records using the Shard Iterator

  • Explanation:
    get-records: Reads data records using the shard iterator.
    $SHARD_ITERATOR: The value we obtained from the previous step, which points to where we start reading.


If our test message was recently put, we should see it in the output, possibly in Base64-encoded form.

Verifying and Testing the Project

  • Console Verification: Check the stream status is ACTIVE and that the “Put data” action succeeded (the console might show a success message or the data’s sequence number).
  • CLI Verification:
    aws kinesis describe-stream should show the stream in ACTIVE state.
    aws kinesis put-record should return a SequenceNumber.
    aws kinesis get-records should display the message data, confirming successful end-to-end transmission.

Common Issues and Troubleshooting

Stream Not Active:
If the stream status is CREATING for too long, refresh the console or wait a few minutes. Ensure you have the correct permissions.

Insufficient Permissions:
Verify your IAM policy allows managing Kinesis. If not, attach the necessary policy to your user or role.

Empty Records on get-records:
Make sure you used the correct shard ID. Also, if the data is older than the retention period, it may have expired.

AWS CLI Misconfiguration:
If commands fail, run aws configure again and confirm the correct region (the same region in which you created the stream).

Data Encoding:
The data might appear Base64-encoded in the CLI output. This is normal; decode if needed for readability.

Conclusion

We have successfully created a Kinesis Data Stream, sent a message to the stream, and retrieved that message, all while using AWS’s Free Tier. We learned how to configure both the AWS Management Console and the CLI to manage streams, send data, and read data in near real-time. These steps form a foundational understanding of how AWS handles streaming data, providing a scalable and robust environment for real-time data processing.

What is Cloud Computing ?

Cloud computing delivers computing resources (servers, storage, databases, networking, and software) over the internet, allowing businesses to scale and pay only for what they use, eliminating the need for physical infrastructure.


  • AWS: The most popular cloud platform, offering scalable compute, storage, AI/ML, and networking services.
  • Azure: A strong enterprise cloud with hybrid capabilities and deep Microsoft product integration.
  • Google Cloud (GCP): Known for data analytics, machine learning, and open-source support.