40 articles on data engineering — SQL, Apache Spark, Kafka, ETL pipelines, and database architecture.
Part of the xbe.at knowledge base. ← Back to index
- Query execution order, window functions, CTEs, subqueries
- Grouping Sets, Rollup, Cube
- Optimizing SQL joins, text-to-SQL with local LLMs
- SQL vs Pandas comparison
- PostgreSQL fundamentals, PandaSQL
- PySpark fundamentals, Spark vs MapReduce
- Spark window functions for time-series analysis
- User-defined functions (UDFs) in Spark
- Pandas UDFs in PySpark, aggregate/transform functions
- Pandas, Dask, PySpark — choosing the right tool
- ETL, ELT, and EtLT approaches compared
- Building end-to-end data pipelines with Python
- Kafka for real-time data pipelines
- Data Engineering workflow stages
- Database sharding, splitting and distributing databases
- Concurrency control and data consistency
- Evolution of databases: from file systems to modern architectures
- Data Mesh vs Data Fabric architectures
- Vector databases and search
- Pandas / Polars / SQL / PySpark syntax translations (15 common operations)
- Kafka for Real-Time Data Pipelines in Python
- 15 Common Pandas Polars SQL PySpark Translations
- Building End-to-End Data Pipelines with Python
- Spark Window Functions for Time-Series Analysis in PySpark
- The Evolution of Databases: From File Systems to Modern Marvels