Scracthing the surface (credits)

Hey you, a new Friday means data news. This week feels a bit like old data news with a variety of articles on different cool topics while I navigate through the actual data trends.

Next Monday I'll present "How to build a data dream team" at Y42 meetup. I'll share in next week edition a written form of my talk. But this week as an appetizer there are 2 articles I really liked about data teams composition.

Last but not least, if you are in Paris on the 6th of December you can join us for the reboot of the Apache Airflow meetups—I'm the organizer. Talks will be given in French. The agenda:

I organised the Paris Apache Airflow Meetup — 6th of Dec. — JOIN US

My two cents about DuckDB

Ok, right now, LinkedIn and Twitter data world are a bit going one-way down the Rust and DuckDB street. While I don't have any opinion on Rust except the fact it's look like a programming language eternal debate I'm bored of, I have one on DuckDB.

Here a small description I wrote about DuckDB 2 newsletter ago:

If you missed it DuckDB is a single-node in-memory OLAP database. In other words it means that DuckDB runs on a single server, loads the data using columnar format in the memory (RAM) and applies transformation on it. Natively DuckDB integrates with SQL and Python, which also means you can query your data with Python or SQL.

First, let's decrypt the marketing. DuckDB mother company called MotherDuck says stuff like: "BigData is dead" or "Your laptop is faster than your data warehouse". Which theorically opens the door back to single instance processing for your data. This is brillantly good, tbh. I buy it. Plus they add this fun tone with ducks, which creates sympathy for the product.

But is it really something?

I think it is, but I might have already been influenced by the marketing. When I think about DuckDB simplicity. It's exhilarating.

You do pip install duckdb then import duckdb and you are good to go. You don't need to run a server. A database is available to you, you can read files (CSV or Parquet) and execute SQL or Dataframe operation on it seamlessly.

I can imagine a list of use cases that will help improving the data engineering workflow but in the same time I don't believe Duck can become the main processing engine of a data platform. I mean, by his single-node nature the technology will for sure serve with brio decentralised teams with central lake but I see more edge use-cases like: running data processing in the CI/CD to quickly validate stuff, provide a great local dev experience to every data developer or empower small data analytics products.

I don't think it can replace current data warehouse vision or technologies and according to me it shouldn't be sell or compared with. But more a cool sidekick to the actual modern data stack. Still I'm afraid with the huge amount of money invested and the actual course of things where everyone wants to try the hype it'll turn differently.

Oliver also shared deeper views on the hype.

Ducks on the horizon (credits)

Data teams need to break out of their bubble

Mary MacCarthy published a great post. It's a wake-up post for data teams. In the current economic situation, all the intellectual discussions about the vision of the field are fun but this is not really for what data teams are built. Data teams are meant to exist in most company to empower other teams. I also bet that the semantic layer, DuckDB, Rust or other trendy stuff is not something that will empower your stakeholders.

Right now the best move you can do according to Mary to empower your stakeholders is to break out of your bubble to really work in pair with them. In the article she takes the example of the relation between the marketing team and the data team that often looks like shadow IT. Martech solutions are often another all-in-one data platform.

Read the article

On the same topic Mikkel Dengsøe came back with a great article about data people outside of the data team. He brings few tips and pitfalls to make this setup works.

Fast News ⚡️

⚖️
Learn how to prepare for new European data privacy requirements — Rare article about data privacy requirements. Atlassian shares law stuff that might resonate to your legal team if you do international data transfer.

Data Fundraising 💰


See you next week ❤️.