Skip to content

Data News — Week 24

Data News #24 — Gentle introduction to Kafka, Snowpark, dbt unit tests, no more data plumbers, PayPal migration to BigQuery and more.

Christophe Blefari
Christophe Blefari
3 min read
A data engineer who tries to carry the data platform (credits)

It's summer time ☀️ (at least in Europe). Firstly I want to thank all the new subscribers to the newsletter, it will help me improve the newsletter in the next months! This week digest features a poetic painting of Kafka concepts, a proposal to do unit tests with dbt, PayPal migration to BigQuery, Plumber Age is over and more.

Data fundraising 💰

  • Rudderstack announces $21M Series A and launches a Customer Data Platform for developers. We already mentioned the age of CDP  in last week digest. Rudderstack is an open-source data platform to manage all your data pipelines in one app.
  • Transform, a centralized metrics store for data analysts, raised $24.5M in funding. They aim to build a single source of truth for business data. Let's where it goes.

If Kafka was writing poetry 🌸

What a nice way to start your day or to start something, Round Robin Publishing wrote a book about Kafka and to promote it drawn an wonderful interactive illustration about all Kafka concepts. It's called A Gentle Introduction to Apache Kafka.

Otters Topics over the Kafka river (capture from slide 16)

Snowpark 🏂

Last week we talked about Snowflake Summit announcements, this week they wrote an article about Java UDFs and Scala DataFrames new API to explain more what they want to achieve in order to create a complete data transformation platform. As a glimpse below an example of how you could use a Java score function from  a compiled jar.

Future is not about autonomous cars but Java inside SQL queries (capture from the article).

My two cents here is that Snowflake now want to also compete with Cloud Data Platform as a whole (like Databricks for instance) and not only be a Cloud Data Warehouse.

Unit testing in dbt

This is something we all struggle to do. How can we unit test SQL queries? 5 years ago I've seen a talk at Spark Summit Europe about how a company builded a complete framework to UT SQL queries, I was amazed but it was something too deep to me.

In this article Betsy Varghese share a first proposal on how you can use dbt seeds and evaluate macro to add a unit tests to your models. This is inspiring.

PayPal to the cloud

...to the cloud, but Google one. PayPal share the long journey (1 year) of their migration to BigQuery. They explain why they have chosen BigQuery and all the milestones along the way to move away from Teradata.

The age of plumber is over

With Data Mesh philosophy incoming we are asking ourselves as data engineers how our work will evolve in the next years. We started as plumber doing pipelines, but this time is over. The article lists what remaines to be done if we don't write ETL or doing plumbing.

On the same topic you can co-read this proposal regarding metrics to measure data engineering teams. How can we gauge ourselves to be better at our DE work?

Learning ressources

This week I share with you two links that could help you improve your skills 📚. First, Awesome Data Engineering: a well presented path with resources in many domains: SQL, Programming, Databases, Warehouses, Data processing, etc.

And second as dbt is becoming more and more the state of the art in term of SQL transformations the Analytics Engineer role is picking up. Here a blog post with a lot of resources to help you become Analytics Engineer in 90 days! Wow.

Build or buy? The infinite question ♾️

Secoda team detailed pros and cons of building vs. buying a data cataloging tool. In the same time to help you choose between tools (buy, build or open-source) Castor team wrote a benchmark of all data catalog solutions.

To finish DataHub team explained how the lineage explorer works in their open-source tool. You also can have a demo.

Python pattern matching is coming

To finish this week newsletter let's talk about Python 3.10 and pattern matching incoming. This article shows what's available in the beta already. Below an overview of what's coming next. I personally love (like the author) the dict structure matching!

Python pattern matching. Applause the new match — case structure 👏
datanews

Christophe Blefari

Data Engineering Coach that enjoys all kind of data platform.