Skip to content

Data News — Week 15

Data News #15 — Union.ai (Flyte) fundraising, Datadogs of tomorrow, Kafka and Presto at Uber, Feathr a new feature store, a lot of fast news. 

Christophe Blefari
Christophe Blefari
4 min read
Easter weekend (credits)

Bonjour Data News readers. In order for me to prepare the anniversary community special edition if you have time could you send me your 3 favourite articles you read recently, but written at anytime. And for fun can you also send me the place where you are when you are reading this newsletter edition — on my side I enjoy the sun in the mountains ☀️.

Data fundraising 💰

Union.ai raised $10m in a seed round for another workflow orchestration tool built on top of Kubernetes. They are the team behind Flyte, the workflow orchestration tool chosen by Spotify to replace Luigi and initially developed at Lyft. This is impressive how the startup soft power today comes from open-source frameworks. Back in the days Luigi lost the battle against Airflow — in the background Airbnb vs. Spotify. And now Spotify is coming back with the round 2. With a lot of money and more competitors.

Then if we look at marketing and how Union.ai position the product in the market we see that they sell a ML and Data Science tool rather than a generic pipeline management system. This is something I've also notice while chatting with Prefect team, companies do not want to face Airflow generic capacities but address Airflow flaws particularly in ML space. Even though the Apache project by its generic nature can cover everything. In the end it's just about writing Python.

As a side note Flyte is written in Go.

The Datadogs of tomorrow

This is clearly the line drawn by data observability tools, they want to become the Datadog of the data field following the success of the company — valued at $50b. Which is a bit ironical because why can't we use the original Datadog rather than a copy?

Data Discovery Tool: why you absolutely need one!

Anas from HiPay shared what made his team pick Amundsen as discovery tool for their data platform. If you are still in the process to find the needs for this kind of tool in your company it'll help you for sure.

Kafka analytics at massive scale at Uber

Uber data teams rely heavily on Kafka when it comes to data infrastructure. In summary they are event driven and everything goes inside. After Kafka a lot of different tools are playing their part. Presto has a big role in this and they operator 15 clusters with 7000 weekly active users. This is massive. They detailed how Presto interacts with Kafka.

If you want an entry-level post Khandelwal explained step by step how you can query Kafka from Presto.

Massive Kafka (credits)

Feathr, a new feature store, entering the game

LinkedIn open-sourced their feature store, Feathr. It is written in Scala. For people not familiar with the matter a feature store is a centralized data store dedicated to machine learning features. The idea behind is to factorize ml features computation and results. Thank to it we can avoid repeating same feature engineering in each micro-service.

Feathr is built out of multiple components: offline store (object and SQL) + online store, a feature registry and compute engine. The online store proposed on Github Readme is Redis.

The second news in the post is that Feathr will also be provided to Azure cloud users.

Three tips to save BigQuery costs with immediate effect

I have to admit that I'm ashamed not knowing the second tip. Montadhar wrote 3 BigQuery tips to save costs. Which means saving query time. Which means in the end saving company money.

Fast News ⚡️


credits

No comments 💬

New category where I just share bare links (and also I have nothing to say but I like the articles).

datanews

Data Explorer

The hub to explore Data News links

Search and bookmark more than 1200 links

Explore

Christophe Blefari

Senior Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 22.38

Data News #22.38 — Hidden gems in dbt artifacts, understand the Snowflake query optimizer, Python untar vulnerability, fast news and ML Friday.

Members Public

Data News — Week 22.37

Data News #22.37 — Data roles: lead, analytics engineer, data engineer, the metrics layers, McDonald's event-driven and fast news.