Bonjour ! 🥐 I really want to thank all of you. We are already 150. In less than 6 weeks, thanks again! To introduce myself, I'm a French engineer. I started to work with data 7 years ago and it's been a pleasure since then. In this newsletter I want to share what I enjoy the most about our field: discovery.
Data fundraising 💰
This week I just discovered a tool that decided to increase their seeding round from $3.6m to $4.7m: Obviously AI. They describe themselves as the fastest and the easiest data prediction tool in the world. What I find interesting here is that they target analysts in needs regarding ML.
Dysfunctional data team
Create a data team is not an easy task. I've seen a lot of data people and data team in an identity crisis. Defining team priorities is often difficult because everyone in the company should be data empowered.
Erik Bernhardsson wrote a story post about building a data team in a mid-stage startup, this is well written. On the other hand Jesse Anderson — a pioneer in data team building — also explains that adding a data engineer in the data science team will not fix your team.
Data Mesh mess
In 2021 we got 2 trendy concepts. The first one is dbt and everything around the Analytics Engineering. The second one is Data Mesh. Everyone started writing articles about Data Mesh. Moussa Taifi referenced all the articles that came out since the founding post.
In case you want a more pragmatic way to see this Data Mesh hype, I invite you to read this definition write up about "data products" vs. "data as a product".
Force Tableau impersonation with Snowflake
Everyone knows Tableau. Probably if you are a data engineer like me you want to stay away from him because it's means headache. It's some kind of complicated relationship. But it's time now to create a new relationship with him! Pinterest team successfully setup Tableau to force impersonation when it comes to connect to Snowflake.
French Open Data platform
French open data platform rolled out the new design in beta 2 days ago. The platform is a really great tool and contains more than 37000 datasets. I love this platform for two reasons. First, when you are a data person you can find a lot of cool project ideas there. Then the platform is also open source (mainly Python) and that means you could use it internally for your data sharing projects or contribute.
97+1 concepts every data engineer should know
On another level if you want to understand distributed data storage I propose you this awesome post by Quentin Truong (Google).
Typeclasses in Python
Machine Learning Friday — Can we trust recommendation algorithm?
It's summer time and holidays are coming, but would you trust an algorithm to pick your next destination? With this question the author write a superb explanation of decision trees.
How the other side LinkedIn AI team detailed their process to deliver value to members. This article is worth checking to understand how LinkedIn recommendation works.
- Apple has been reported to be the largest Google Cloud Storage consumer ($300m prediction in 2021) — also Apple seems to be a big AWS client.
- Github Copilot, in case you've missed it or that you live in a cave without internet, an AI driven companion to help you write code faster (and also write code for you).
- Beneath a serverless tool where you can host data and query it in real time, it includes a community free tier.
- Airflow Summit has started.
See you next week and do not hesitate to pm me if anything comes to your mind ❤️.
Join the newsletter to receive the latest updates in your inbox.