Data News — mid-2023 popular articles

July 28, 2023 — Data News

two gray and black boats near dock — 🧜‍♂️ (credits)

Hey, this is a mid-2023 edition with some of my favourite articles and the popular articles that have been shared this year in the newsletter. There isn't any fancy calculation on how to find the popular articles. Here how it's done.

Every link sent in each newsletter is tracked in 2 ways:

when you click on a link it first redirect you to my blog so I know that you've clicked on it
it adds ref=blef.fr to the url, so the original articles knows that the traffic comes from me, mainly it's a great way to support me by being discoverable to others

I've used the click data to sort articles by popularity. Obviously it has a few biais like recent editions get more clicks because I have more subscribers but impact is minimal.

A few numbers. Since the beginning of the year I've shared around 500 articles, which generated at least 22k views on creators articles. I say at least because this is an low estimated number, from a projected experience I think that in the reality it's twice this number.

If you have travel time I also recommend you the first episode of Data Minds, my podcast, with Joe Reis.

Popular articles

I have sorted the articles by bucket. The order does not really makes sense, they were all popular.

General

💰 Because we all love money, Mikkel's Europe data salary benchmark was the most viewed. In the article he shares salaries extracted from job listings, using dimensions like seniority, location and companies.
📃 In every data team this is super important to write documentation, Marie wrote an awesome 101 about data documentation. The article gives best practices for establishing complete and reliable data documentation.
🎰 Reducing the lottery factor, also named the bus factor is risk measurement about knowledge sharing. In data teams a lot of work have to be done in the early days to avoid knowledge to be lost later on. The article gives ~10 advices to apply to lower the risks. Among them I like the changelog, the pair-programming, the pre-recorded video and the stable credentials.
🌎 The data journey manifesto is a manifesto to put principles on the data journey to avoid the mess in production. There are 11 principles and 11 new ideas to create an healthy platform. For instance you should not trust your data providers and what worked last week will not work today.

Modern data stack

🔮 The future of data by Pedram. 3 takes on the future of data teams. I really like Pedram, he tweets a lot—or we should way xs—a gives great advices with humour. Mainly the articles says finally we address ops teams, the semantic layer is the next big battle and business logic management is a mess. He also recently joined Dagster team in DevRel.
🔥 Matt gives 5 hot takes on the modern data stack. I don’t totally agree with everything. This is about Redshift, Airflow, Airbyte, dbt and production.
🧱 A good summary of the required blocks composing the modern data stack.

Technical deep-dive

🏗️ Simon wrote an excellent 3 parts data modeling deep-dive. An introduction to data modeling, the different techniques and the tools and future.
📑 Data contracts were very trendy this year. I also think they are quite useful. PayPal released their template for data contract. This is a exhaustive list of what you can expect in contract: schema, quality, SLAs, security and custom properties.
👨‍🏫 Count.co designed 2 amazing boards. You can learn SQL or follow a guide to hire your data team.
🐍 Finally a few useful code patterns in Python.

See you next week ❤️ and I wish you great holidays.