Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Data Engineering

40 articles on data engineering — SQL, Apache Spark, Kafka, ETL pipelines, and database architecture.

Part of the xbe.at knowledge base. ← Back to index


What's inside

SQL

  • Query execution order, window functions, CTEs, subqueries
  • Grouping Sets, Rollup, Cube
  • Optimizing SQL joins, text-to-SQL with local LLMs
  • SQL vs Pandas comparison
  • PostgreSQL fundamentals, PandaSQL

Apache Spark & PySpark

  • PySpark fundamentals, Spark vs MapReduce
  • Spark window functions for time-series analysis
  • User-defined functions (UDFs) in Spark
  • Pandas UDFs in PySpark, aggregate/transform functions
  • Pandas, Dask, PySpark — choosing the right tool

Data pipelines & ETL

  • ETL, ELT, and EtLT approaches compared
  • Building end-to-end data pipelines with Python
  • Kafka for real-time data pipelines
  • Data Engineering workflow stages

Databases

  • Database sharding, splitting and distributing databases
  • Concurrency control and data consistency
  • Evolution of databases: from file systems to modern architectures
  • Data Mesh vs Data Fabric architectures
  • Vector databases and search

Translations

  • Pandas / Polars / SQL / PySpark syntax translations (15 common operations)

Highlights

  • Kafka for Real-Time Data Pipelines in Python
  • 15 Common Pandas Polars SQL PySpark Translations
  • Building End-to-End Data Pipelines with Python
  • Spark Window Functions for Time-Series Analysis in PySpark
  • The Evolution of Databases: From File Systems to Modern Marvels