Set up Spark and Hive for data warehousing and processing

This is the second article in a series to build a Big Data development environment in AWS. If you've not read the first article, you'll likely be confused. Please go read Create a single node Hadoop cluster Setup Spark and Hive in Hadoop cluster We've set up the storage service HDFS and the resource manager … Continue reading Set up Spark and Hive for data warehousing and processing

Create a single node Hadoop cluster

Starting out in Data Engineering Hadoop on EC2 When I cut my teeth in Data Engineering in 2018, Apache Spark was all the rage. Spark's in-memory processing made it lightening-fast and made older frameworks such as Apache Pig obsolete. You couldn't call yourself a Data Engineer without knowing Spark. I was a fledgling Data Engineer … Continue reading Create a single node Hadoop cluster