Skip to content

Data News — Week 24.15

Data News #24.15 — MDSFest quick recap, LLM news, Airbnb Chronon, AST, Beam YAML, WAP and more.

Christophe Blefari
Christophe Blefari
4 min read
crowd of people at concert
The fest we deserve (credits)

I hope this Data News finds you well. In today's edition we have a large selection of links, I think you will enjoy it.

But first I want to welcome all the new members joining this week after my new episode on DataGen with Robin Conquet. This is an episode in French and we talked mainly about the eventual end of the modern data stack. Which I have already condensed in a post a few weeks ago (in English).

MDS Fest 🥳

As announced last week I've participated to the MDS Fest 2.0 this Thursday. I've shared my journey with Apache Superset and why I consider Superset the best open-source alternative when it comes to building BI applications.

Yes because you should stop building dashboard and build BI apps instead. It enters in the productisation of the data, but mainly I think you should consider your BI tools as a way for your users to interact with data and not only to monitor metrics. With customisation possible Superset is the best tool for it.

You can have a look at my slides or watch the replay on YouTube.

In the same conference a lot of other talks took place here a few selection you should check out:

PS: Apache Superset is going 4.0 this week with a lot of new features.

AI News 🤖

  • Open-source LLMs for everyone — A great post from Siemens AI team about open LLMs initiatives that brings new usages in the dev workflow, wether it's about code completion and pull request / crash reporting summarisation, it looks neat.
  • Building LLMs for code repair — Replit is a AI-driven workspace for developers , think of a supercharged IDE. They wrote a blog about what they developed to create a LLM driven fix suggestion for LSP (Language Server Protocol), which is a protocol between your IDE and a server that understand and analyse the code to find errors or highlight the code.
  • LLM DataGen — A small demo of a LLM based on Gemma that generates JSONL based on a given name. This is not working super well and it would be better if we could specify the columns names and types for instance, but showcase another great usage example of generative algorithms.
  • Meta, building an infrastructure for the future — It explains how Meta is partnering with GPUs vendors to design new chips and how it's incredibly hard to connect thousands of GPUs in a cluster where everything can fail at every moment.
  • Can Gemini 1.5 actually read all the Harry Potter books at once? — A nice Graphviz chart spotted on Twitter of the whole Harry Potter relationships in a poster. Done by Gemini with the content of all the books. Obviously Gemini already knows a few of the Hogwarts lore by his training, but still this is impressive. Sadly we don't have the complete prompt / code.
  • Speaking of prompts, PromptLayer organised a tournament and they blogged about their favourite prompts of the competition. Once again speaking to a LLM is like speaking to children, USE CAPITAL LETTERS TO CAPTURE THEIR ATTENTION.
  • OpenAI open-sourced a light library to evaluate language models — you can use 7 different evals and check the results on OpenAI or Claude models.
  • Mistral-8x22B is out — a new model that does something probably awesome.

Fast News ⚡️

✨ s/o to Hugo who runs a weekly data round-up and this week he published before me so you can also check his great links selection.


See you next week ❤️

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.16

Data News #24.16 — Llama the Third, Mistral probable $5B valuation, structured Gen AI, principal engineers, big data scale to count billions and benchmarks.

Members Public

Data News — Week 24.14

Data News #24.14 — New MAD landscape, polars on GPU, git in Snowflake, open data portals and more.