Skip to content

Data News — Week 24.04

Data News #24.04 — Let's talk AI podcast interview, data & AI products conference, Disney VR floor, dbt awesome community projects.

Christophe Blefari
Christophe Blefari
4 min read
single perspective of pathway leading to house
Hey (credits)

Hey, new week new email. This is already end of January but I took time to travel and see people I did not see for a long time so I'm super happy how this new year is starting.

Next week, I'll be wrapping up my DataOps lecture by incorporating how to deploy machine learning models. This is a fun part where students learn how to serve a simple classifier in production. Building a custom HTTP API, Docker image and CI/CD processes making it accessible on internet. For the modern part this year, I'm going to integrate an LLM "classifier" part, it might attract their curiosity. We'll see.

Yesterday an interview I did for the podcast Let's talk AI has been published. Available everywhere. We talke about data engineering, freelancing and career stuff.

Let's Talk AI podcast new episode

Data & AI products

Yesterday I went to 5h conference organised in Paris about Data & AI products—in French, the idea of the conference was to mix people coming from data and product ecosystem which is, let's be honest, the key enabler for AI in production. The recording will be online in a few days / weeks and I'll share them once online.

Here a few takeaways in a messy way:

  • Data products and organisational impacts
    • Data engineers are still the limiting human resource.
    • Data mesh by the book will not work, if you want to scale you can't just add more people in a central team.
    • Data mesh means decentralisation but more importantly ownership and responsibilities to team (esp. data producers)—if every team has to be responsible you need to have a easy-to-use platform and you have to explicitly give them responsibilities.
  • UX for data products — This is a presentation I really enjoyed by Claire Lebarz, VP data at Malt. Without the voice you will miss a lot of things, still it contains great practicals tips.
    • Before jumping to AI projects you need first to start with words and to define metrics reflecting [your] values. You don't want your AI to give bad product experience. So define—as a metric—what you don't want to have.
    • Then Claire schematised human interaction with models (slide 8) via an interface with inputs and outputs. Inputs and outputs can be instrumented with multiple techniques that will empower people in their interaction with AI algorithm.
      • Inputs — This is what you ask from the users to feed your algorithm. It can be done with calibration, implicit or explicit feedback and corrections.
      • Outputs — Product design choices where you give power over the algorithm. It can be done with multiple options (like trips alternative on Google Maps), attributions (why something has been recommended), confidence interval (weather) or limitations.
    • As data people you need to build relationships with designers to converge on common terms about human-AI interactions
  • And other bits I got from the others talks
    • OKR means metrics alignement across the company which lead to team autonomy—AI teams should be autonomous in finding solutions to move indicators
    • It critical to have dashboards measuring success when AB testing models
    • "Product is about people crafting together to best solutions and experiences to solve a customer problem" — Anne-Claire Baschet.
How to interact with AI models — Claire Lebarz, UX for data products

AI News 🤖

Fast News ⚡️

  • Disney Holotile VR floor — Disney developed a "dynamic" floor for VR use-cases. With it you can walk without really moving. This is a bit disturbing but it can unlock the metaverse future.
  • ClickHouse and the one billion row challenge — ClickHouse proposed a SQL solution with ClickHouse local to the a challenge consisting in aggregating 1B rows in a text file. Initially this challenge has to be answered in Java. The leader submitted a solution running in less than 2s—have fun—while ClickHouse took 19s.
  • SQLGlot switch to Rust — I really like SQLGlot, this is a SQL parser that gives you back the AST to do stuff. They ported the parser from Python to Rust and got 30-40% performance improvement.
  • 2024 data engineering trends — We are still in January so it's still valid, Anna captured a few things that will make data teams busy this year. Firstly the reducing in resources leading teams to do more with less (or at least doing the same with less).
  • Snowflake batch data loading — A good explanation of the Snowflake kCOPY INTO command and what you need to setup around it to make it work.
  • Effective pandas 2 is outI did not read the book. As pandas is, still, everywhere, it can be a good ressource if you need to learn the 2.0 version.
  • Introducing girls to code, one flower at a time — An awesome initiative to introduce girls to code and data visualisation through creative coding projects, there is a Notion guidebook to do data visualisations with p5.js.
  • The open-source enterprise data platform in a single portal — Bern local community open-sourced an data platform blueprints to launch an all-in-one data platform with dbt, Airflow and Superset on top of Postgres and K.
  • Github/dbt-assertions — A dbt package to write dbt tests at row-level and to save exceptions alongside your failing rows (cf. example).
  • PR comment template for dbt data projects — Great stuff. This is a proposal of Github pull request template when modifying dbt models. It includes description, lineage diff, illustration of model changes or impacts and more.

See you next week ❤️.

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.16

Data News #24.16 — Llama the Third, Mistral probable $5B valuation, structured Gen AI, principal engineers, big data scale to count billions and benchmarks.

Members Public

Data News — Week 24.15

Data News #24.15 — MDSFest quick recap, LLM news, Airbnb Chronon, AST, Beam YAML, WAP and more.