πŸ˜₯ (credits)

Hey. Already the end of August. Go back to school is approaching. This is a feeling that never left me growing up. You know when you see the summer holidays coming to an end while the stress of the new year is coming.

Who's starting a new work soon? Would you like more content on this topic?

Regarding the blog hygiene, we are slowly approaching the 1400 members. This week I've done a small style refresh of the blog (changing the main color mainly) and the Ghost update so you got 2 cool new features:

This week I have also release the the Data Explorer private api. The Explorer will be your hub to search over more than 1000+ data links I've shared in the Data News. With bookmarking, full text search and recommendations. Ping me, I'll give you beta access. Right now I just need to finish the content categorisation.

The Explorer β€” Data News links hub (soon to be released, ask for beta)

Data fundraising πŸ’°

SQL optimisations

Everyone knows that migrating a dbt model to incremental can save you time and money, but we are often lazy to do it. In this article dbt data team explains how they saved $1800/month by migrating to incremental. On the same topic, PΓ©ter shows why generated window functions in dbt can lead to degraded performances. It can save $340 per query!

Data products/contracts

The rise of Data Contracts β€” I've always been a huge fan of the schema registry concept (yes to me it's the same). I think companies should first try to fix their schema management before adding any tool in their stack. Schema registry done correctly fixes everything. But it may be one of the hardest thing to do. It requires a collaboration between tech and data and force SE teams to like databases schema.

Once you have the contracts/schema you can start thinking in term of products/domains.

Hudi vs Delta vs Iceberg β€” The definitive comparison?

I think I may have shared at least 3 posts in the past regarding the comparison between these 3 technologies. This is probably the last time because this one is really exhaustive. Onehouse compared Hudi, Delta and Iceberg on what they do in term of R/W features, table commodities and platform support. They also explain some key concepts of table storage.

Their opinion should be treated cautiously because Onehouse sell a platform powered by Hudi, so I feel they might be biased at least when it comes to platform support.

To balance opinions, there is a post written by James (Product at Snowflake) on why Apache Iceberg will rule data in the cloud. And another one written by Vladimir on how you can use Delta with Spark.

You might be still lost after reading these two post. My personal advice as someone who never tried the 3: pick one, do stuff with it, learn a lot while using it. Once you become better at identifying what you need, challenge the initial choice.

ML Friday πŸ€–

This week we have a ML Friday of a decent size. I really like this category because even if I technically understand 10% of what I share I feel attached to it.

Me writing the newsletter during the Roman Empire (generated with dreamstudio.ai)

As an appetizer let's chat a bit about generative AI. I really like what these AI are all doing β€” DALL-E, Midjourney, dreamstudio, Imagen β€” have built impressing stuff that may change creative process for ever. What will be the future of journalism if we can generate unique images per article? Will artists use AI to avoid the blank page syndrome?

Does it mean we'll live an AI Art Apocalypse in the next years? The author of the article covers very well the topic: economics, why the art, AI as a tool. As in every revolution jobs will be transformed, or worse, lost and we should have empathy for these people doing jobs that may disappear. On the same generative level Google opened a wait list for their experimental AI chatbot.

Other articles are:

Fast News ⚑️


See you later πŸ‘‹