Hi Internet, I’m Norman Lim Xing Kang
I’m working in the tech industry and in a company where Presto is used extensively both for interactive data analysis and ETL
It’s a fantastic tool that’s performant in interactive data analysis and scales to processing Petabytes of data (if you build it right)
But at the same time Presto can be a black box. My interactions with it are usually limited to sending SQL to the coordinator and then waiting for results to be returned.
A few months ago, I came across this book O’Reilly’s Presto: The Definitive Guide by Matt Fuller, Manfred Moser, Martin Traverso .
It’s a beginner friendly introduction into Presto and that’s something that’s been sorely lacking in the Presto community for awhile
I will dedicate a large portion of this blog to replicating exercises in the book while tweaking them to run better in the cloud or scale to a production environment.
So if you’re a fellow Big Data or Presto enthusiast, stick around and join me on a journey to learning more about Presto!
Want to get your hands dirty? Check out my first article Create a Presto Cluster on EC2