It's summer time ☀️ (at least in Europe). Firstly I want to thank all the new subscribers to the newsletter, it will help me improve the newsletter in the next months! This week digest features a poetic painting of Kafka concepts, a proposal to do unit tests with dbt, PayPal migration to BigQuery, Plumber Age is over and more.
Data fundraising 💰
- Rudderstack announces $21M Series A and launches a Customer Data Platform for developers. We already mentioned the age of CDP in last week digest. Rudderstack is an open-source data platform to manage all your data pipelines in one app.
- Transform, a centralized metrics store for data analysts, raised $24.5M in funding. They aim to build a single source of truth for business data. Let's where it goes.
If Kafka was writing poetry 🌸
What a nice way to start your day or to start something, Round Robin Publishing wrote a book about Kafka and to promote it drawn an wonderful interactive illustration about all Kafka concepts. It's called A Gentle Introduction to Apache Kafka.
Last week we talked about Snowflake Summit announcements, this week they wrote an article about Java UDFs and Scala DataFrames new API to explain more what they want to achieve in order to create a complete data transformation platform. As a glimpse below an example of how you could use a Java
score function from a compiled jar.
My two cents here is that Snowflake now want to also compete with Cloud Data Platform as a whole (like Databricks for instance) and not only be a Cloud Data Warehouse.
Unit testing in dbt
This is something we all struggle to do. How can we unit test SQL queries? 5 years ago I've seen a talk at Spark Summit Europe about how a company builded a complete framework to UT SQL queries, I was amazed but it was something too deep to me.
In this article Betsy Varghese share a first proposal on how you can use dbt seeds and
evaluate macro to add a unit tests to your models. This is inspiring.
PayPal to the cloud
...to the cloud, but Google one. PayPal share the long journey (1 year) of their migration to BigQuery. They explain why they have chosen BigQuery and all the milestones along the way to move away from Teradata.
The age of plumber is over
With Data Mesh philosophy incoming we are asking ourselves as data engineers how our work will evolve in the next years. We started as plumber doing pipelines, but this time is over. The article lists what remaines to be done if we don't write ETL or doing plumbing.
On the same topic you can co-read this proposal regarding metrics to measure data engineering teams. How can we gauge ourselves to be better at our DE work?
This week I share with you two links that could help you improve your skills 📚. First, Awesome Data Engineering: a well presented path with resources in many domains: SQL, Programming, Databases, Warehouses, Data processing, etc.
And second as dbt is becoming more and more the state of the art in term of SQL transformations the Analytics Engineer role is picking up. Here a blog post with a lot of resources to help you become Analytics Engineer in 90 days! Wow.
Build or buy? The infinite question ♾️
Secoda team detailed pros and cons of building vs. buying a data cataloging tool. In the same time to help you choose between tools (buy, build or open-source) Castor team wrote a benchmark of all data catalog solutions.
Python pattern matching is coming
To finish this week newsletter let's talk about Python 3.10 and pattern matching incoming. This article shows what's available in the beta already. Below an overview of what's coming next. I personally love (like the author) the dict structure matching!
Join the newsletter to receive the latest updates in your inbox.