Skip to content

Data News — Week 23.29

Data News #23.29 — Hightouch and Unstructured fundraising, data as a game, dbt and ChatGPT, OpenHouse the new warehouse.

Christophe Blefari
Christophe Blefari
3 min read
yellow Volkswagen van on road
See you on the road (credits)

Hey, I hope this newsletter finds you well. This is a small blogpost to give you a few reads while waiting for your next travel. We can already feel summer, I found less articles to enter the selection this week.

Also be ready for the Data News: Summer Edition. For the next 5 releases it will be a bit different than usual, less curation and more original articles written in advance to allow me to take a break.

You'll—probably—get:

  • A 2023 must-read articles
  • How to create a batch data platform—using Tour de France data—from ingestion to visualisation using all the fancy tools the data world can offer (in 2 or 3 parts)
  • Docker for data people
  • The disparition of the data engineer

Fast News ⚡️

red Sony PS DualShock 4
Give a controller to your stakeholders (credits)
  • For the love of the game — Winnie from dbt Labs wrote a great post about seeing data a as game, analytics being the game design. What if we conceive data as a game for our consumers and not as a linear tool to do boring actions. In the article the author also shares dbt is jQuery, not Terraform, which awesomely describes how dbt helps you enter flow state for data work.
  • How an acquisition fails — It's been a long time since I've shared Benn's articles, but as always I can't recommend him enough. This time it's about tech acquisitions and what can be done to fail—or succeed.
  • Microsoft Fabric: An end to end implementation — A first—blurred—glimpse of Microsoft Fabric capabilities, Jordan reads data from Sharepoint and Azure Storage, then transform it using PySpark to visualise stuff in PowerBI. Classically boring stuff.
  • How to chat with data in Snowflake using ChatGPT, dbt, and Streamlit — Less boring, obviously when you put ChatGPT and dbt in the same sentence it creates buzz instantly. This is an interesting demo of how you can quickly build a chat experience—using OpenAI—on top of you data models.
  • LLM based pipelines with PostgresML and dbt — Mainly for me this is a discovery of the PostgresML an open-source extension that brings ML functions to the database. As cloud databases like Snowflake and BigQuery brought it years ago, this was mandatory for the Postgres stack. In the article it shows you that you can than run transformers or embeddings directly from dbt.
  • Taking charge of tables: introducing OpenHouse for big data management — New data product at LinkedIn: OpenHouse. OpenHouse sits on top of the LakeHouse to bring a control plane to managed Iceberg files. It reminds me something... We used to call it warehouse back in the days.
  • Models on HuggingFace — Clement, the CEO of HuggingFace, congratulates the community and himself because a lot of public models are hosted on HuggingFace, it shows how fast and deep things are going.
  • Plot Gallery on Observable — I'm not often a fan, but Mike Bostock is different. He created d3.js while at the New York Time, he brought something unique to digital data visualisation. More recently he co-founder Observable, which is an awesome tool to do visualisations, and the plot gallery makes me envious—while quite simplistic.

Data Economy 💰

  • Hightouch raises $38m in a Venture round. Hightouch has been primarily known for his reverse ETL solution. With the money the team announced a new suite of tools to activate customers in the warehouse. You can see it as a CDP—customer data platform—in your warehouse. It means you get a unified view of customers across all your tables.
  • Polar Analytics raises $9m Series A. Polar Analytics is a vertical SaaS to provide analytics for Shopify vendors. This is less data engineering oriented but still I find interesting to see a "reporting" product raising money. Also vertical product like this can give ideas to marketplace on what can be great reportings.
  • Unstructured raises $25m Series A to build ETL for LLMs. Unstructured wants to give you the ETL toolkit to use company complex data like HTML, PDF, CSV, PNG, PPTX, as they say on their site. Personally I did not know that CSV was a complex source of data but ok. To be honest at the moment it looks like a fancy text extractor.

See you next week ❤️.

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.28

Data News #24.28 — Catching up the news, OpenAI, Claude, kyutai and all the engineering stuff from the last 3 weeks.

Members Public

Databricks, Snowflake and the future

Databricks and Snowflake summits featured major announcements, including open-sourcing their catalogs and enhancing Iceberg compatibility. This article covers all the key updates you need to know.