Skip to content

Data News — Week 23.27

Data News #23.27 — My new French podcast, New vision for dbt Core semantic layer, langchain explained, carbon footprint of pizza.

Christophe Blefari
Christophe Blefari
6 min read

group of cyclists marching on highway
Who's leading the data peloton? (credits)

Hey you, this is the Saturday Data News edition 🥲. Time flies. I'm working for the Series of articles in advance for August about "creating data platforms" and I'm looking for ideas about the data I could use for this. Having some kind of simulated real-time data would be the best. But it requires to write a simulation. Which is enough complicated. What would you use?

Small French aside 🇫🇷

(A small part in French, jump to next section)

Cette semaine j'ai lancé mon podcast en français nommé À l'heure des données. Dans ce podcast, qui sera mensuel, je vais discuter avec des experts francophones qui font l'écosystème. On discutera du présent mais aussi du futur.

Dans le premier épisode j'ai discuté avec Benoit Pimpaud qui a été data scientist à l'Olympique de Marseille et qui s'est reconverti plus tard chez Deezer en data engineer. Aujourd'hui il s'occupe du produit chez Kestra, un orchestrateur open-source développé en France.

🎧 Pour nous écouter : AppleSpotifyDeezerAmazon

Sue un tout autre sujet, Stéphane Bortzmeyer a participé au colloque du CNRS sur Penser et Créer avec les IA génératives et il a écrit un rapport sur ces 2 jours.

PS : est-ce qu'une version française de mon contenu t'intéresse ?

The new dbt Semantic Layer

Following the acquisition of Transform by dbt Labs a few months ago, dbt Core integrates MetricsFlow. MetricsFlow was the semantic layer of the acquired company. This week, Nick Handel, co-founder of ex-Transform, wrote about how dbt Core specs will adapt.

As a reminder a semantic layer is a definition on top of your models meant to be reusable. The idea, is then, to use the semantics to generate SQL queries. You can read my article on the semantic layer.

In the new vision it will be possible to define multiple things:

  • entities —It defines the nodes of your business models. In a dbt model, you can define primary and foreign entities. A foreign entity defines an edge between models, hence a join in the final query.
  • measures — A value aggregation.
  • dimensions — A categorical or a time field than can be used either in a group by either in a filter.
  • metrics — A pre-defined object that combines entities, measures and dimensions.
Semantics and metrics in dbt Core explained. (credits: the example is reworked from Nick's examples)

Just ahead I gave you an precise example of how the new nomenclature will behave for a simple case with a fact_transaction model. This is important to notice that the semantic layer is something that sits on top of you current dbt models definitions.

To complete the picture this is important to notice that the revenue_usd metrics can be queried at the moment either with a CLI, either via the API dbt Labs will release through their dbt Cloud offering.

As an extension I've seen 2 things this week that I feel makes sense here:

  • VulcanSQL — A data API framework for DuckDB, Snowflake, BigQuery, PostgreSQL. Actually Vulcan let's you define in a blink parametrise SQL that you can expose through an API. It comes then with a catalog, a documentation and a way to connect downstream consumers tools (e.g. CSV exports, Excel, Sheets, etc.)
  • A Rill Data dashboard about DuckDB commits — DuckDB commits is just an example. What I want to show here is Rill Data UI, while being relatively simple offers a standardise way to explore a dataset. On the left you get the metrics, on the right the dimensions, everything can be clickable and allows you to drill down. Under the hood it's "BI-as-code", YAML defining this dashboard can be found on Github.

These two examples are not really semantic layers in the strict sense, but revolve around the concept.

Gen AI 🤖

cooked food on round white ceramic plate
Now you want to think twice before eating a pizza (credits)

Fast News ⚡️

Data Economy 💰

  • DigitalOcean acquires Paperspace. Paperspace is an all-in-one SaaS product to develop, train and deploy AI applications. With a custom Notebook UI based on Jupyter you can develop your models while checking at ressources, when the models is reading you can deploy it within containers.
  • Redpanda raises $100m in Series C. Redpanda is a great product for developers. The best way to describe it is: this is a Kafka alternative. Built for modern times it removes most of the Kafka complexity by implementing all Kafka APIs.

See you next week ❤️

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.16

Data News #24.16 — Llama the Third, Mistral probable $5B valuation, structured Gen AI, principal engineers, big data scale to count billions and benchmarks.

Members Public

Data News — Week 24.15

Data News #24.15 — MDSFest quick recap, LLM news, Airbnb Chronon, AST, Beam YAML, WAP and more.