About

I drive change through forming partnerships with both business and tech leads.

My specialty is building data platforms that accelerate an organization’s ability to turn data into insights.

Key skills: Data Engineering, Platform Management, Project Management

Tech: Trino/Presto, Spark, Kubernetes, Airflow, Terraform

Experiences

Vice President, Data Engineer @ DBS

2022 Nov – Present

Building a next-generation AI and Data Platform on Kubernetes using GitOps principles and ArgoCD

Kubernetes Cluster Administration
• Deploy a cluster and CNI using Kubeadm
• Create platform observability using Prometheus and Grafana
• Automate database administration tasks using Argo Workflow
• Generate a cluster scorecard using Popeye and Polaris

Database Management on Kubernetes
• Deploy TiDB on EKS using Helm, Kustomize, and ArgoCD
• Design and deploy active-passive database replication using Change Data Capture (CDC)
• Test database resiliency via Chaos Engineering using Chaos Mesh
• Run load tests on TiDB using TPC-C and Argo Workflow

Testing
• Deploy Locust in Kubernetes using ArgoCD
• Deploy Chaos Mesh in Kubernetes using ArgoCD

Big Data Architecture
• Run Trino on Kubernetes
• Run the Hive metastore on Kubernetes

Certified Kubernetes Administrator (CKA)
Certified Kubernetes Application Developer (CKAD)
Certified AWS Solutions Architect – Professional

Senior Engineer @ Versent

2022 May – 2022 Nov

Specialist in Cloud, Data and Kubernetes

Data Engineering Consultant @ Accenture

2021 Apr – 2022 May

Enterprise data architect

Subject matter expert in Spark, Presto, and Hive frameworks

Data Engineer at Standard Chartered (Dec ’21 – May ’22)
• Wrote Spark pipelines in Scala and built with Gradle
• Deployed Airflow using Kubernetes and Helm on EKS
• Built a data sharing microservice using Spring Boot, WebFlux, and S3

Data Engineer at Julius Baer (Aug – Nov ’21)
• Productionized a Hive to MySQL pipeline using Docker, Jenkins and Kubernetes
• Designed the app’s MySQL data model in an E-R diagram
• Tech: Hive, Docker, Kubernetes, Jenkins, Python

Data Architect at Singapore Airlines (May – Aug ’21)
• Created a POC of Airflow in Kubernetes
• Benchmarked Spark Streaming load times into Greenplum and MySQL
• Drive adoption of DevOps practices such as CI/CD and use of git flow
• Tech: Spark, PySpark, Airflow, Spark Streaming, Greenplum, MySQL

Data Engineering Manager @ Grab

2019 – 2021

Led Grab’s 4 person team Query Platform team.

Built Grab’s in-house Presto SQL scheduler and orchestrator which supported over 200 daily active users and 5000 data pipelines. Tech: ReactJS, Golang, Airflow, Python, Docker, Kubernetes

Administer Grab’s Tableau infrastructure. Maintained over 99.9% uptime and supported over 2000 active users. Tech: Python, Bash, AWS

Administered Alation Data Catalog. Ingested metadata such as query logs and hive metadata into Alation to created a catalog of Grab’s Datalake

Data Engineer @ Grab

2018 – 2019

Engineering Owner for Grab’s data platform and supported 200+ daily active users. Frontend was built using Bootstrap, jQuery, and Django. The scheduling was done using Python. MySQL and Mongo were used as databases. Deployment was done using Docker and Kubernetes on AWS.

Provisioned Grab’s cloud infrastructure, including Spark and Presto on EMR clusters, on AWS using Terraform. I migrated a Presto EMR job to Athena to save $100,000 per month.

Migrated EMR clusters to Qubole’s autoscaling Presto clusters to save the company $450,000 per month

Wrote and maintained billion record ETL pipelines written in Scala for Spark. Scheduling was done using Airflow.

Built dashboards on Datadog to monitor the health of ETL jobs

Insights Analyst @ Grab

2017 – 2018

Documented user stories from operational teams across Southeast Asia to build an incentives design and execution product. The final product cut operational time from 7 days to 2 days (70%)

Provided consulting services to business leads regarding spending to acquire and retain drivers

Research Assistant @ National University of Singapore

2016 – 2017

Scraped and ingested 2 commercial databases: Capital IQ and fDi Markets as structured records of more than 1 million supplier-customer relations and warehoused it in a MySQL server.
Tech: Selenium, MySQL, Python, Powershell, .
Code: https://github.com/frenoid/capiq_crawler and https://github.com/frenoid/FDI_crawler

Understand academics’ data requirements then wrote SQL queries on demand to query patent databases to study knowledge flows across networks
Tech: SQL

Researched China’s trade patterns by running instrumental and fixed effects regressions on Chinese customs data to understand China’s place in global supply chains. Tech: STATA

Built a data warehouse to store accumulated data. Ensured there was sufficient replication by maintaining on-site backups.
Tech: SQL Server 2016, Windows Backup