Create a single node Hadoop cluster

Starting out in Data Engineering Hadoop on EC2 When I cut my teeth in Data Engineering in 2018, Apache Spark was all the rage. Spark's in-memory processing made it lightening-fast and made older frameworks such as Apache Pig obsolete. You couldn't call yourself a Data Engineer without knowing Spark. I was a fledgling Data Engineer … Continue reading Create a single node Hadoop cluster

How to create a multi-node Presto cluster on AWS EC2

What's wrong with a single node Presto cluster? In a previous post, I created a single-node Presto cluster where the coordinator and worker processes run on the same node. That's a bad idea in large clusters. Processing work on the coordinator can starve the coordinator process of resources and negatively impact scheduling work and monitoring … Continue reading How to create a multi-node Presto cluster on AWS EC2

How I passed the AWS Certified Solutions Architect Professional exam

This a continuation of What is the AWS Certified Solutions Architect - Professional exam If you don't know what a AWS Certified Solutions Architect or why you should become one, read part 1 or check out AWS website. How did I prepare for it? The AWS Certified Solutions Architect exam asks 75 multiple-choice questions in … Continue reading How I passed the AWS Certified Solutions Architect Professional exam

Creating a Presto Cluster on EC2

Today, I am going to create a Presto cluster on an AWS EC2 instance. I am aware of AWS ElasticMapReduce, Amazon's Managed Hadoop offering but since this is a technical exercise to learn about Presto internals, we're going to do things the hard way 🙂 Prerequisites I assume you have some technical knowledge, namely Working … Continue reading Creating a Presto Cluster on EC2