trees with wind photo
Mistral (credits)

Hello all, this is the Data News, this week edition might be smaller than usual in term of comments as I'm working on a Data News related project that takes me a bit of time, which will probably lead to a series of articles.

Before I forget I've appeared on The Joe Reis Show, we chatted with Joe about data engineering teaching, why it is hard and about generative AI that will change education for ever. This is a 1h podcast, I hope you will enjoy listening to it.

Final reminder, next week there is La Conférence MLOps which will take place in Paris on March 7th. If you want to register I sill have a 40% promocode: mlops-blef-40. I'll give a talk—in French—about how to put in production machine learning at a small scale. Topic which is related to the Data News project 😬.

AI News 🤖

Extract and load, still unsolved 🤭

I've started writing data pipelines in 2014 and the movement from sources to destinations has always been one of the most discussed topic in my data engineering spaces. Personally I'm the kind of guy who likes to build it custom because I think an out-of-box solution does not exist. In the end you finish with a composable solution mixing up 2 or 3 technologies to extract and load you data in your central storage, ready for transformations.

In 2024 we are more than ever tools to move data from sources to destinations. But the field has taken a new direction.

Until now, solutions were mainly full platforms (often in the cloud) with the promise to do everything in search of rebundling the data platform (cf. The unbundling of Airflow). Recently, it has reached new heights: what if the extract and load is just a small library layer that integrates whatever you're doing—for people reading me carefully this is what I was calling for in using Airflow the wrong way, but the fun way.

Enters the new kids on the blocks:

We see a pattern here, when we talk about extract and load there are 2 kinds of sources: databases and APIs, behind able to do both correctly is the key.

On the other side of the movement there is a new open-source reverse-ETL technology called Multiwoven/multiwoven. This is built in Ruby (haha). At the moment it can sync to Facebook, Salesforce and Slack.

green trees and plants under blue sky and white clouds during daytime
Rare footage of a roman extract and load pipeline (credits)

Fast News ⚡️

Tech stuff


See you next week ❤️