Optimizing Spark Timestamp Columns in Parquet Files
How to improve compression and query performance for high-cardinality timestamp columns in Apache Spark by switching from INT96 to INT64 encoding
Welcome to my digital garden where I share learnings and ideas on data and software engineering.
How to improve compression and query performance for high-cardinality timestamp columns in Apache Spark by switching from INT96 to INT64 encoding
.. and stop crushing it!
Adapting to changing data...
unit testing the data...
In typical data engineering tasks, we often work with procedural style code for data transformations