A nightmare for people in holidays (credits)

Hello dear members. For people in holidays I hope you enjoy as much as you can. For the others I feel you and here the usual Data News to keep you up-to-date on this Friday afternoon.

Let's start a conversation this week. What is your biggest problem right now?

Mine is Superset virtual datasets, I'm working on a Superset project and the textarea to write SQL queries to build the data model is too small in a way that it is so unproductive. Except from this I'm surprised by how far Superset can deliver.

Data fundraising 💰

Data job market status

It's well know that data job market is highly stressed. All companies have engineering and analytics positions opened. Last week at the Data Analytics Careers Summit, Dustin revealed data about the data analytics market. This is super interesting. We can see that demand in SQL grew by 27pts since 2020, while Tableau, PowerBI and Excel are each mentioned in a third of job postings.

On the same topic someone analysed all the jobs posted in the dbt Slack community (~3k) and made some statistics about it. We can clearly see that analytics engineering has picked up while data engineering demand stayed the same.

The best data team

There are many ways to create great data teams. This is the kind of articles I'm really into. What if we could create a partnership between data and teams enabling data to be more than a support role, this is the Data Business Partnership. The post provides great guidelines to try this implementation.

We often say that this is a bad idea to look at data unicorns. But what if instead we were looking for data heroes. Mikkel wrote another great post about data teams. So yeah, how can find, activate and retain data heroes? You know, these people that add this little more in your team.

Find your data heroes (credits)

It's summer, so let's speak about snowflakes

What a boring category title but as I got 3 articles about Snowflake I wanted to group them here.

Firstly, you can try to do machine learning with Snowflake by using the recently released Snowpark. Then it's time to master the query profiler, thanks to Teej you'll able to get started at query graphs readings. And now that you had fun playing with ML and queries you should have a look at your Snowflake bills.

7 best practices for data ingestion

Saikat wrote a small wrap-up about basics best practices everyone should follow when writing data ingestion pipelines. Guess which one is my favourite.

On the same topic Matt wrote about data backfilling. To me, backfilling is one topic that really shows the difference between a data engineer and a great data engineer. Probably because backfilling requires experience and patience. It's easy to run a pipeline, but when your pipeline should recompute or reingest to data from the last 4 years, the stress it'll put on system will be heavy.

And also I have to disagree with Matt's post on how to handle backfilling. I've made the mistake in the past to create dedicated backfilling pipelines but I think this is a bad idea. If pipelines are idempotent and deterministic your don't need another branching, at least this is the Airflow way to do it.

Cool findings 🔎

ML Friday — Reinforcement Learning at Netflix

Reinforcement learning is something a bit mystic for me. Every year when I give classes I try to give examples and this is one from Netflix is quite interesting. They picked RL models to find optimal recommendations under constraints, our most limited resource: our time.

More ML: 4 essential steps for building a simulator.

Don't let the copilot take the wheel (credits)

Fast News ⚡️


See you next week ❤️