In this article I will discuss how to build a cloud agnostic Big Data processing and storage solution running entirely in Kubernetes. This design avoids vendor lock-in by using only open-source technologies and avoiding cloud-managed products such as S3 and Amazon ElasticMapReduce in favour of MinIO and Apache Spark
How to create a Data Lake in AWS using S3 as the storage layer, Glue as the metastore, and Trino on Kubernetes as the query engine.
This is the beginning of a new series centering on the use of Kubernetes to host Big Data infrastructure. In this article I will run a single-node Trino cluster in local Kubernetes cluster called minikube