Skip to content

Data News — Week 32

Data News #32 — Fundraising and transfers, modern data stack drawback, being a data manager, data quality frameworks, CDC and more

Christophe Blefari
Christophe Blefari
4 min read
People reading my newsletter in a nice office (credits)

Hi, I hope you have a good time at work. I know you'll get well deserved holidays soon! As last week was a special edition this week I'll feature the last two weeks articles. Don't forget to give me feedback on the weekly digest, it helps me a lot.

Data fundraising 💰

This is not about fundraising but transfers. After Felipe Hoffa that moved from Google to Snowflake 1 year ago. Mosha Pasumansky also left Google to become CTO at Firebolt. Firebolt also hired Octavian Zarzu as developer advocate. Octavian was previously sharing daily tips about Snowflake.

As a reminder Firebolt is a new competitor in cloud data warehouses space promising huge performance uplift and big costs savings.

Data Surveys

Two 2021 surveys got published recently: the first one about the state of production machine learning and the second one about the state of data engineering. Do not hesitate to fill the surveys to help the community.

Conan making surveys for the data ecosystem (credits)

The unspoken gerrymandering of the modern data stack

I feature a lot of articles about MDS and tools that gravitates around. But last week Benn wrote thoughts on modern data stack and why the explosion of tools and tools categories is not a good evolution of the landscape.

On the matter if you want to go further I propose you the transcript of a Tristan Handy interview about why dbt was started to fill the gap between engineers and analysts.

We the purple people

Following the whole discussion about the MDS and the Analytics Engineering discipline creation that is linked. dbt Labs (and more precisely Anna Filippova) wrote about the real need of a people to translate business into technical. The purple people.

Play in SQL with Reddit comments

Felipe loaded 261GB of Reddit comments to Snowflake and played with it. If you want to see in action Java UDFs, data loading and recursive queries this post is for you.

Playing with Snowflake at the beach (credits)

Stream Your Database Changes with Change Data Capture

Nice article about CDC, with illustrations about databases events and also some databases triggers examples that could be used. If you want to go deeper in CDC or to implement it this should be on your reading list before starting.

Top-notch data quality frameworks

In this digest we are lucky because we have to post from top teams. Airbnb shared their Wall framework to prevent bug and then improve data quality. On the other side Uber shared how they achieved operational excellence in Data Quality.

Being a data engineering manager

Tiffany Jachja gave us feedback after her 3 first weeks as a data engineering manager and what she set up to map skills, roles and ownership. She also detailed the vision of the data team.

Data engineering guides

This week I share with you one BIG guide and a smaller one guide. This big one is The Data Engineering Cookbook, the repo got almost 10k stars on Github, it's well written and contains all information you need to know to start your DE journey.

The second one is a guide to prepare Data Engineering interviews. I'd add to this guide that you should prepare accordingly to company technologies and that Spark and Hadoop aren't that mandatory today.

Fast News

The BInosaurus (credits)

Thanks, see you next week ❤️.

datanews

Christophe Blefari

Data Engineering Coach that enjoys all kind of data platform.