I drive change through forming partnerships with both business and tech leads.
My specialty is building data platforms that accelerate an organization’s ability to turn data into insights.
Key skills: Data Engineering, Platform Management, Project Management
Tech: Trino/Presto, Spark, Kubernetes, Airflow, Terraform
Vice President, Data Engineer @ DBS
2022 Nov – Present
Building a next-generation AI and Data Platform on Kubernetes using GitOps principles and ArgoCD
Kubernetes Cluster Administration
• Deploy a cluster and CNI using Kubeadm
• Create platform observability using Prometheus and Grafana
• Automate database administration tasks using Argo Workflow
• Generate a cluster scorecard using Popeye and Polaris
Database Management on Kubernetes
• Deploy TiDB on EKS using Helm, Kustomize, and ArgoCD
• Design and deploy active-passive database replication using Change Data Capture (CDC)
• Test database resiliency via Chaos Engineering using Chaos Mesh
• Run load tests on TiDB using TPC-C and Argo Workflow
• Deploy Locust in Kubernetes using ArgoCD
• Deploy Chaos Mesh in Kubernetes using ArgoCD
Big Data Architecture
• Run Trino on Kubernetes
• Run the Hive metastore on Kubernetes
Certified Kubernetes Administrator (CKA)
Certified Kubernetes Application Developer (CKAD)
Certified AWS Solutions Architect – Professional
Senior Engineer @ Versent
2022 May – 2022 Nov
Specialist in Cloud, Data and Kubernetes
Data Engineering Consultant @ Accenture
2021 Apr – 2022 May
Enterprise data architect
Subject matter expert in Spark, Presto, and Hive frameworks
Data Engineer at Standard Chartered (Dec ’21 – May ’22)
• Wrote Spark pipelines in Scala and built with Gradle
• Deployed Airflow using Kubernetes and Helm on EKS
• Built a data sharing microservice using Spring Boot, WebFlux, and S3
Data Engineer at Julius Baer (Aug – Nov ’21)
• Productionized a Hive to MySQL pipeline using Docker, Jenkins and Kubernetes
• Designed the app’s MySQL data model in an E-R diagram
• Tech: Hive, Docker, Kubernetes, Jenkins, Python
Data Architect at Singapore Airlines (May – Aug ’21)
• Created a POC of Airflow in Kubernetes
• Benchmarked Spark Streaming load times into Greenplum and MySQL
• Drive adoption of DevOps practices such as CI/CD and use of git flow
• Tech: Spark, PySpark, Airflow, Spark Streaming, Greenplum, MySQL
Data Engineering Manager @ Grab
2019 – 2021
Led Grab’s 4 person team Query Platform team.
Built Grab’s in-house Presto SQL scheduler and orchestrator which supported over 200 daily active users and 5000 data pipelines. Tech: ReactJS, Golang, Airflow, Python, Docker, Kubernetes
Administer Grab’s Tableau infrastructure. Maintained over 99.9% uptime and supported over 2000 active users. Tech: Python, Bash, AWS
Administered Alation Data Catalog. Ingested metadata such as query logs and hive metadata into Alation to created a catalog of Grab’s Datalake
Data Engineer @ Grab
2018 – 2019
Engineering Owner for Grab’s data platform and supported 200+ daily active users. Frontend was built using Bootstrap, jQuery, and Django. The scheduling was done using Python. MySQL and Mongo were used as databases. Deployment was done using Docker and Kubernetes on AWS.
Provisioned Grab’s cloud infrastructure, including Spark and Presto on EMR clusters, on AWS using Terraform. I migrated a Presto EMR job to Athena to save $100,000 per month.
Migrated EMR clusters to Qubole’s autoscaling Presto clusters to save the company $450,000 per month
Wrote and maintained billion record ETL pipelines written in Scala for Spark. Scheduling was done using Airflow.
Built dashboards on Datadog to monitor the health of ETL jobs
Insights Analyst @ Grab
2017 – 2018
Documented user stories from operational teams across Southeast Asia to build an incentives design and execution product. The final product cut operational time from 7 days to 2 days (70%)
Provided consulting services to business leads regarding spending to acquire and retain drivers
Research Assistant @ National University of Singapore
2016 – 2017
Scraped and ingested 2 commercial databases: Capital IQ and fDi Markets as structured records of more than 1 million supplier-customer relations and warehoused it in a MySQL server.
Tech: Selenium, MySQL, Python, Powershell, .
Code: https://github.com/frenoid/capiq_crawler and https://github.com/frenoid/FDI_crawler
Understand academics’ data requirements then wrote SQL queries on demand to query patent databases to study knowledge flows across networks
Researched China’s trade patterns by running instrumental and fixed effects regressions on Chinese customs data to understand China’s place in global supply chains. Tech: STATA
Built a data warehouse to store accumulated data. Ensured there was sufficient replication by maintaining on-site backups.
Tech: SQL Server 2016, Windows Backup