Skip to content

Data News — Week 27

Data News #27 — Dysfunctional data team, Data Mesh mess, understand data storage, Tableau impersonation, can we trust recommendation algorithms?

Christophe Blefari
Christophe Blefari
3 min read
How I depict myself each week reading community articles (credits)

Bonjour ! 🥐 I really want to thank all of you. We are already 150. In less than 6 weeks, thanks again! To introduce myself, I'm a French engineer. I started to work with data 7 years ago and it's been a pleasure since then. In this newsletter I want to share what I enjoy the most about our field: discovery.

Data fundraising 💰

This week I just discovered a tool that decided to increase their seeding round from  $3.6m to $4.7m: Obviously AI. They describe themselves as the fastest and the easiest data prediction tool in the world. What I find interesting here is that they target analysts in needs regarding ML.

Dysfunctional data team

Create a data team is not an easy task. I've seen a lot of data people and data team in an identity crisis. Defining team priorities is often difficult because everyone in the company should be data empowered.

Erik Bernhardsson wrote a story post about building a data team in a mid-stage startup, this is well written. On the other hand Jesse Anderson — a pioneer in data team building — also explains that adding a data engineer in the data science team will not fix your team.

Data Mesh mess

In 2021 we got 2 trendy concepts. The first one is dbt and everything around the Analytics Engineering. The second one is Data Mesh. Everyone started writing articles about Data Mesh. Moussa Taifi referenced all the articles that came out since the founding post.

Number of Data Mesh articles per month since 2019.

In case you want a more pragmatic way to see this Data Mesh hype, I invite you to read this definition write up about "data products" vs. "data as a product".

Another Mesh (credits — sorry it's a french language joke)

Force Tableau impersonation with Snowflake

Everyone knows Tableau. Probably if you are a data engineer like me you want to stay away from him because it's means headache. It's some kind of complicated relationship. But it's time now to create a new relationship with him! Pinterest team successfully setup Tableau to force impersonation when it comes to connect to Snowflake.

French Open Data platform

French open data platform rolled out the new design in beta 2 days ago. The platform is a really great tool and contains more than 37000 datasets. I love this platform for two reasons. First, when you are a data person you can find a lot of cool project ideas there. Then the platform is also open source (mainly Python) and that means you could use it internally for your data sharing projects or contribute.

97+1 concepts every data engineer should know

Last week we had 10 tips and now we have 97 concepts! This is a new O'Reilly book written by Tobias Macey. I've ordered the book but not yet read! For sure I'll do a summary for you once done.

On another level if you want to understand distributed data storage I propose you this awesome post by Quentin Truong (Google).

Typeclasses in Python

3 weeks ago we saw pattern matching in Python 3.10 and now we have a huge (really huge) article about a typeclasses proposal in Python through the dry-python library.

Machine Learning Friday — Can we trust recommendation algorithm?

It's summer time and holidays are coming, but would you trust an algorithm to pick your next destination? With this question the author write a superb explanation of decision trees.

How the other side LinkedIn AI team detailed their process to deliver value to members. This article is worth checking to understand how LinkedIn recommendation works.

Spotify recommendation after it detected you broke up (credits)

Fast news

  • Apple has been reported to be the largest Google Cloud Storage consumer ($300m prediction in 2021) — also Apple seems to be a big AWS client.
  • Github Copilot, in case you've missed it or that you live in a cave without internet, an AI driven companion to help you write code faster (and also write code for you).
  • Beneath a serverless tool where you can host data and query it in real time, it includes a community free tier.
  • Airflow Summit has started.

See you next week and do not hesitate to pm me if anything comes to your mind ❤️.

datanews

Christophe Blefari

Data Engineering Coach that enjoys all kind of data platform.