amazon kinesis crash course

Amazon Kinesis Crash Course Guide

Welcome to part 7 of a multiple part course on passing your AWS Architect, Developer & Sysops Associate exams. The best part…this course is totally free of charge!

By this point in our free course on AWS Certifications you would have covered lots of cool topics. In case you forgot, here’s a little reminder of those articles:

AWS Identity and Access Management (IAM) – Certification Guide

AWS Route53 – Associate Certification Guide & Exam Questions

AWS Certification RDS Guide – With Exam Tips

AWS Simple Queue Service Guide (SQS)

AWS Simple Notification Service & Simple Workflow Service Guides

AWS API Gateway For Dummies

If you’re coming into this article about Kinesis and haven’t read my previous posts, then I definitely recommend checking them out.

What is Amazon Kinesis? Amazon Kinesis is a service for processing massive amounts of data from hundreds of sources in real time. Kinesis is split into 3 products, Kinesis Streams, Kinesis Firehose and Kinesis Analytics.

The article will take just 5 minutes to read and after doing so you’ll have all you need to know to answer any question around Amazon Kinesis in the AWS certifications.

Who should read this?

If you are studying for the:

  • AWS Associate Architect
  • AWS Associate Developer
  • AWS Associate SysOps

Or you are using AWS and want to learn more about Amazon Kinesis, then this is the article for you.

What is streaming data?

Ok, before we get ahead of ourselves and dive into Kinesis. Let’s start with the basics of what data streams are. Otherwise you can lose track of the problem that this services is trying to solve. I find that a lot of guides just dive straight into the technology without addressing the real world problem it solves.

People tend to use the analogy of a stream to describe this concept. Simply because when you think about it, the water in the stream is constantly moving. If you stand at any one place in a stream then you’ll just watch a constant flow of water passing you by.

Now all we have to do is imagine we’re standing by a stream, but instead of water flowing by. It is information instead. A constant stream of new information passing you by in real time.

Make sense?

So now you know the basics of what a stream is. Now you’re probably asking where can it be used? There’s a hole bunch of data streams that could be fed into Kinesis,

For instance:

Weather data is an obvious one. Your stream could be a constantly changing weather data and then analyse it to form real time weather models.

Stock prices are constantly fluctuating. You could use this data stream to analyse and apply predictions that stock traders can make decisions on whether to buy a share or not.

Social network data can be used to for any number of purposes, just imagine if you could monitor the data feeds from twitter and react in real time to events happening on the platform.

E-commerce sales, want to track how your black Friday sale is performing? Hook your sales data streams into Kinesis and you could make finite adjustments to the sale as it happens to increase efficiency.

The list goes on and on!

What is Kinesis?

You’re now fully versed with data streams and examples of how they can be used in real life scenarios. Now it’s time to see where Kinesis comes in and how it helps cater for these real world use cases.

Kinesis is in my opinion a really easy concept to get your head around. Basically it sits between Producers and Consumers. These are terms that get used all the time when referring to Kinesis.

Producers: Basically is something that well…producers data. This could be anything from your laptop, mobile phone, social network etc… Basically anything that “produces” data and then can send that data to an external source is classified as a producer.

Consumers: These sit behind Kinesis, so after it’s done it’s work amalgamating multiple data streams and potentially analysing them, the data is then sent to consumers. A consumer could be an S3 bucket, or a Elastic Mapping module, RedShift, DynamoDB instance etc….

Kinesis basically sits between these two players and acts as an intermediary between the two.

3 Core Kinesis Types

Kinesis is broken down into 3 core products. Each product fulfils a particular use case and you’re expected to know which one to choose depending on your circumstances. I’ve broken down those 3 types and given a quick description of what they do. Honestly you won’t need to know much more than this, so there’s not much to learn. Happy days!

Kinesis Streams

Kinesis Firehose

Kinesis Analytics

Exam tip: The Architect associate exam is likely to throw scenarios at you and you then have to figure out which of the kinesis types is most appropriate to match the scenario.

Kinesis Streams in a nutshell

Kinesis Streams are the default original Kinesis option. The main points to note about Streams is that when data is received by Kinesis, it is stored in things called shards. It’s not highly important to understand the inner mechanisms of shards. You just have to be aware that the data is stored there.

With Kinesis streams, the data is retained for 24 hours by default, meaning if it isn’t processed by that time, then the data will be erased. You can however extend the retention period up to 7 days if desired.

Consumers (in the form of EC2 instances) then consume that data and transfer it to storage options such as S3, Dynamo, Redshift, Elastic MapReduce for further analysis.

Kinesis Firehose for the uninitiated

Kinesis firehose is a more opinionated/automated way of using Kinesis. Using this option you don’t have to worry about shards at all. However there are a few drawbacks in that the where you had a retention period of between 1 – 7 days with Kinesis streams, with firehose you have none. Which means as the data comes in, it much be processed.

You can however, run Lambda functions as the data is sent through Firehose, making it much more lightweight and easy to setup compared to attaching EC2 instances to Kinesis streams.

Lastly, you can only send data from Kinesis Firehose to S3 buckets. Of course you can then add a step afterwards to send that data elsewhere. But you would have to configure that yourself.

Kinesis Analytics for the data nerds

With the final option, kinesis analytics allows to run SQL queries against kinesis streams and firehose in real time. This means as data is received by kinesis you can run queries to extract metadata about the data that is streaming through your service.

This could be useful when you want to make manual real time changes to systems. For instance the black Friday sales example I gave earlier in the What is Kinesis section.

Conclusion

Ok, so that took about 5 minutes and now you know Kinesis at a high level. You can hold a conversation with a colleague about the uses of Kinesis and how data streams can be used in powerful ways to react in realtime to change situations.

In our next post, we’ll start diving into VPC (virtual private cloud) unlike kinesis, VPC plays a major role in the AWS certifications and knowing in inside out is vital to passing the exams. So tune in and checkout the VPC for dummies guide next!

 

Leave a Comment