This is the beginning of a new series centering on the use of Kubernetes to host Big Data infrastructure. In this article I will run a single-node Trino cluster in local Kubernetes cluster called minikube
Tag: ec2
Create a JupyterLab notebook for Spark
Imagine this - you've created a pipeline to clean your company's raw data and enrich it according to business requirements. You've documented each table and column in excruciating detail. Finally you built a dashboard brimming with charts and insights which tell a compelling narrative of the business' health and direction. How do you share and present your work?
How to create a multi-node Presto cluster on AWS EC2
What's wrong with a single node Presto cluster? In a previous post, I created a single-node Presto cluster where the coordinator and worker processes run on the same node. That's a bad idea in large clusters. Processing work on the coordinator can starve the coordinator process of resources and negatively impact scheduling work and monitoring … Continue reading How to create a multi-node Presto cluster on AWS EC2
Creating a Presto Cluster on EC2
Today, I am going to create a Presto cluster on an AWS EC2 instance. I am aware of AWS ElasticMapReduce, Amazon's Managed Hadoop offering but since this is a technical exercise to learn about Presto internals, we're going to do things the hard way 🙂 Prerequisites I assume you have some technical knowledge, namely Working … Continue reading Creating a Presto Cluster on EC2
Welcome to Norman’s Presto Adventures
I'm working in the tech industry and in a company where Presto is used extensively both for interactive data analysis and ETL. It's a fantastic tool that's performant in interactive data analysis and scales to processing Petabytes of data (if you build it right)