Data Engineer building scalable systems

Welcome to my digital garden where I share learnings and ideas on data and software engineering.

Recent Posts

Optimizing Spark Timestamp Columns in Parquet Files

How to improve compression and query performance for high-cardinality timestamp columns in Apache Spark by switching from INT96 to INT64 encoding

#apache-spark#parquet#performance#data-engineering

Effective use of Spark driver

.. and stop crushing it!

#apache-spark#data-engineering#pandas#pyspark#performance

Schema Evolution in Databricks Delta Lake

Adapting to changing data...

#delta-lake#databricks#schema-evolution#data-engineering#apache-spark

Deequ - An Open Source Data Quality Library

unit testing the data...

#data-quality#apache-spark#deequ#data-engineering#testing

Spark Chained Transformations

In typical data engineering tasks, we often work with procedural style code for data transformations

#apache-spark#data-engineering#scala#databricks
View all posts →