Imagine this - you've created a pipeline to clean your company's raw data and enrich it according to business requirements. You've documented each table and column in excruciating detail. Finally you built a dashboard brimming with charts and insights which tell a compelling narrative of the business' health and direction. How do you share and present your work?
Create a single node Hadoop cluster
Starting out in Data Engineering Hadoop on EC2 When I cut my teeth in Data Engineering in 2018, Apache Spark was all the rage. Spark's in-memory processing made it lightening-fast and made older frameworks such as Apache Pig obsolete. You couldn't call yourself a Data Engineer without knowing Spark. I was a fledgling Data Engineer … Continue reading Create a single node Hadoop cluster