F1 (credits)

Hello 🏎, this weekend the French Formula 1 Grand Prix is taking place. As I'm going there this is a slightly shorter edition. But I also tried to do a curation regarding how F1 teams are using data maximize performance.

Data fundraising πŸ’°

How F1 teams are using data?

To be honest it has been very hard to find public informations regarding this topic. All the teams are secret about the matter β€” I think we can understand why. This is sad because as F1 is a performance sport where every second count. We could learn a lot when it comes to real-time data use-cases.

So while looking information I mainly navigated through marketing stuff but I found some really interesting YouTube videos about how Formula teams are using data:

Your next favourite dbt browser extension

Yesterday I have worked on a experiment I had in mind since last year. This is a browser extension that helps you working with BigQuery and Snowflake when using dbt. The extension overrides the default clipboard behaviour to replace table names by the corresponding ref or source.

You can find a demo of the extension on my LinkedIn post.

A glimpse of the extension.

Stop using so many CTEs

Claire, one of the greatest thinker about analytics engineering job, took position regarding CTEs. You know CTEs. The syntax everyone decided to use in order to avoid subqueries to create more linear queries.

For Claire, we should stop using so many CTEs. I do agree, in today's data world we reached a point where CTEs are so deeply integrated in our data stacks that it could bring more downsides than perks. The proposed solution is the Chained SQL β€”Β a feature in Hex β€” but philosophically it can apply everywhere. Smaller SQL pieces to bring modularity.

Airflow is still the king

Once in a while I cense Airflow because it's like my first data love. This week Jarek β€” who is an active Airflow committer and PMC β€” also shared this love. He describes how generic transfers are designed in Airflow and how simple it is to use them. Personally I've been always convinced that writing a transfer DAG in Airflow is so simple that dedicated EL tooling needs is not so strong. Jarek shows the way.

If you are still starting with Airflow David started a series with an introduction post. On the other side Seattle Data Guy put words on this infinite debate: why data engineers love/hate Airflow. When jumping to the real word, Jellysmack explains how they used Airflow to orchestrate in production data science jobs.

And finally you still think you need to redevelop yourself an orchestrator you can get inspiration on how Criteo developed BigDataflow, their internal DAGs-based scheduler/orchestrator.

Holidays β˜€οΈ

Like in every good newspaper during the holidays there are some games and especially crosswords. So I tried to give you a small crossword to enjoy data while at the beach β€” or at the office while others are enjoying the sand. Try the grid online.

DATA HOLIDAYS CROSSWORD

If you feel the crossword is too easy and you have more time to play I recommend you to use SQLordle β€”Β a wordle but only with SQL keywords.

PS: if you want more crosswords just tell me I'll try to make others until the end of the summer.

Fast News ⚑️

Do not hesitate to protect yourself from the sun (credits)