My view from the train window (credits)

Dear Data News readers it's a joy every week to write this newsletter, we are slowly approaching the second birthday of this newsletter. In order to celebrate this together I'd love to receive your stories about data—can be short or long, anonymous or not. This is an open box, just write me with what you have on the mind and I'll bundle an edition with it.

This is fun because I'm usually not someone who's good at having habits. Every week to be honest I get hit by Friday. I don't write in advance. Every week you get a taste of my current mood. I often try to sync my travels on Fridays, even if internet is terrible in the train, this is still a good way to fill the +8 hours travel time I'm used to.

Today I take the following commitment: I will never use any generative algorithm to write something in the newsletter. Fun story because one year ago I had an intern working with me on the blog to whom I had given the task to write code that was able to learn from my writings to generate a Data News edition. One year later, different views. In ChatGPT times, my idea is just boring.

On the other side, at the moment I'm not really organised to check if articles that I share have been totally written by humans, but same shit, I'll do as much as I can to avoid sharing empty articles like I've always did. It might be a good use-case for GPTZero.

As a data professional this is probably the height to not want to use AI. But right now the field feels like when cryptocurrencies arrived. Awesome raw ideas with sharks circling around waiting for a new productivity highness.

PS: last week I did a—bad—joke about Apache naming and a reader pointed me an article about the ASF and non-Indigenous appropriation.

This is enough about my life, let's jump to the news.

Back to the roots, a few engineering articles

I did not know how to put together these articles, so here a few loose articles. In my manage and schedule dbt guide in a nutshell I say that in dbt projects you have 2 lifecycles. The first one is the developing experience and the second is the dbt runtime. It means you have to run dbt somewhere:

In term of data modeling ThoughtSpot wrote about the best data modeling methods and Chad—the pope of Data Contracts—wrote about data contracts for the warehouse, mainly it shift the responsibilities to data producers in order to enforce schema and semantic, but in the data world it is sometimes rather an utopia. Producers are often software teams that, sadly, does not care about data teams.

Finally Noah shared how he improved data quality by removing 80% of the tests and Ronald proposed a framework to create data products in Airflow.

Data people are creatives 🪄

This is a new category that will appear in the next Data News edition. In this category I'll share stuff that we can do with data. The idea is to inspire others by promoting the end use-case rather than just the technology. I'll be more than happy to share what you do.

This is us (credits)

Fast News ⚡️

Data Economy 💰


See you next week ❤️.