Skip to content

Data News — Week 23.12

Data News #23.12 — Mage and Kestra takeaways, OpenAI plugins system and impact on job market, Reddit outage post mortem, etc.

Christophe Blefari
Christophe Blefari
5 min read
The Earth can also generate great images (credits)

Dear readers, I hope this new edition finds you well. It seems that you really liked the recent editions, which is perfect because it was fun to write. I feel that this week all the articles I found relevant for the newsletter are either AI related or technical. I really don't know how to deal with news overflow about the Gen AI landscape. Do you like all the GenAI hype? 👍 or 👎

Airflow alternatives meetup

This Tuesday also took place the first part of the Airflow alternatives Meetup with Mage and Kestra. It was an awesome online meetup. I really liked the presentations from Mage and Kestra and even if I was focus on hosting the event it was great to see 2 other visions about the future of orchestration. Which, to be honest, are not really far from Airflow.

Here are my takeaways about the event:

  • Mage and Kestra have been both developed with Airflow flaws in mind, especially about deployment complexity, reusability and data sharing between tasks.
  • The tagline "Modern replacement for Airflow" on Mage side makes sense. Out of the box Mage provide all-in-one web editor to write data pipelines with a great UX. In small browser text areas you will be able to write Python, SQL or R code and orchestrate theses transformations with drag-n-drop. I personally hate developing in the browser, but the promise looks good. But actually Mage and actual Airflow version are—almost—the same the only difference is the UX when developing pipelines.
  • Tommy, Mage's CEO, said that for the moment they will focus on building the best open-source data pipelines tool. They got enough funding for the next 2 years.
  • Facing the reality, even if worker management seems easier in Mage, the deployment is not yet ready to go. Either you go with Terraform script that will launch elastic containers either you go with helm, but requires Kubernetes.
  • Now Kestra. One of last kid on the block. Ludovic, the CTO who presented Kestra at the event, said that he started the development while at a mission at Leroy Merlin where people were heavily unhappy about Airflow. Kestra is a YAML-based data pipeline tool mixed with string templating. The YAML approach allowed less-technical users to be able to write pipeline.
  • Kestra vision is also very open, everything is accessible through APIs. Which leads to a variety of usage for a company. Under the hood Kestra is developed in Java which is totally different than other alternatives.
  • Kestra in the future can easily looks like Mage, YAML being the mid-step before a "drag-n-drop" like UI.

It was so fun to organise this event and I'd love to do more live in the future with blef.fr. Still, in 2 weeks on April 4th the part 2 of the event with Prefect and Dagster will take place, I hope I'll see you there.

You should register for the part 2

Gen AI 🤖

The newsletter is already to big for today so I'll try to keep it short especially on this Gen AI that is already spammed everywhere.

OpenAI is slowly starting to create a gigantic ecosystem and could become the next GAFA-like company. The non-profit research company manifesto is already far away. OpenAI released a study about Large Langage Models—LLMs—impact on the job market (sorry I wanted to read the pdf but my brain is already grilled) and announced ChatGPT plugins. In a nutshell OpenAI is has created a AI interface that everyone likes and will add on top of it a App Store experience with plugins. It reminds me of something.

Because OpenAI is not everything, some news of the alternative world. Mozilla announced Mozilla.ai a community-based open-source AI ecosystem, Stanford researches released Alpaca a model that behaves similarly than OpenAI text-davinci-003 but that costs a lot less ($600 to train it), there is also a list of open alternatives to ChatGPT.

I'd have love to speak about tools offering to translate human langage to SQL like sequel.ai or SQL translator but it would open the Pandora's box about self-service analytics and this is for next week.

Fast News ⚡️

Pi? (credits)

Data Economy 💰

  • Sifflet raises €12.8m Series A. Initially this is a data observability tool but it turned out they added features like lineage and cataloging. Often needed to better contextualise alerts but also to avoid tools multiplicity when working with big corporations.
  • Hex raises $28m in a Venture Round[1]. Hex is a notebook-based analytics application. Cells are at the center of the analytics, they produce outputs than can be used later in other cells on in visualisation. The visualisation can be organised in a Notion-like document but with live data. I recently tried Hex, the UX is neat and I think the tool is worth it for production-ready explorations[2]. Here an example with the MAD data—I made no presentation effort.
  • DragonflyDB raises $21m Series A. Dragonfly is a replacement for Redis claiming to outperform it in many way (throughput, snapshotting speed, scaling). I don't have a lot to say except the fact that we are going in a future with a lot of databases choices.

  1. A venture round is when the series has not been specified.
  2. I just coined the term, I mean, it's when you do a great exploration and you want to share a professional result to your stakeholders.

See you next week ❤️.

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.24

Data News #24.24 — I'm back sorry for the late news. I'm co-organising a conference in Paris in Nov, CfP is open, AI news with OpenAI and Apple and a lot of Fast News.

Members Public

Data News — Week 24.20

Data News #24.20 — Big edition, 5000 members ❤️, launching Qrators to search in videos, Data Council, OpenAI and Google I/O stuff and data eng stuff.