Me yesterday (credits)

Dear members, I hope this edition finds you well. I'm sorry to be late once again, but it coming to an end. I've been giving classes since January almost every Friday and I've only 2 left in the next weeks so the original schedule will be back soon.

Data fundraising 💰

Let's speak about Excel. Excel has been here for years and it has been a saviour for a lot of people and a lot of companies. The Excel ecosystem is huge, maybe wider than the data ecosystem. With all the efforts we all made, Excel is still a reference, people will still want to export your lovely Tableau dashboard into an Excel spreadsheet. We could name this Shadow data in relation to Shadow IT.

Data platforms future

This week Petr and Benn spoke about data platforms, not really from the same perspective but to me they are both saying the same thing: data stacks are built on top of core tools/concepts where everything else should be encapsulated inside at some point.

Petr proposed a different way to bundle data platforms. Or to rephrase it, a different way to categorise data tools. In a nutshell, today we have more than 20 categories of tools in the data ecosystem. This is a big number and we should relabel everything. On top of core layers — ingestion, storage, transformation, visualization and discovery — we need to provide cross-layers features like scheduling, orchestration, etc.

Petr vision of data platform cross-layers (credits)
The experience of teams running data platforms depends on their ability to handle the above problems cohesively across the stack. These problems are hard to solve in isolation within each tool.

I really like Petr conclusion of the unbundling Airflow conversation that has started few weeks ago.

On his side Benn followed-up on the very big deal Snowflake put in place last week by acquiring Streamlit. Benn is placing a bet on the data app store concept. Snowflake acquisition could lead to this strategy. If we consider the warehouse like the main piece everything will exist through his marketplace — or his data app store.

I also learn from Benn's post that Google laid off Looker's departement of Customer Love (more detail here). Once again Google strategy with Looker is hard to follow (Week 41 — Google partnership with Tableau), Looker was perfectly positionned in the Modern Data Stack and from the outside it seems they are ripping it off.

7 antifragile principles for a successful data warehouse

Iliana wrote a series of article about data engineering. In the last part (the 5th one) she detailled 7 antifragile principles to build your data warehouse. I think this post is a great ressources to think about the place of your data warehouse and to review processes around.

Source systems are accountable and responsible for resolving data issues

The second principle is stating that source systems are accountable for data issues. Amen. This is so true. But also so difficult to put in place in the real world because product teams hate data migration. You should check also the 6 other principles.

Solving concurrency in event-driven microservices

If you are trying to understand concurrency in event-driven architectures this post is for you. Hugo is also proposing a solution on top of Kafka to deal with concurrency by design rather by implementation.

Event-driven architecture (credits)

Saturday ML 🦾

Two articles this week in this category. This is food for thoughts.

Fast News ⚡️


See you next week.