Skip to content

Data News — Week 22.27

Data News #22.27 — Whaly raised, databand acquisition by IBM, cloud and data warehouse incompatibility, future of data platforms.

Christophe Blefari
Christophe Blefari
4 min read
Me enjoying the data engineering playlist while everything is good now (credits)

Hey, it'll probably be one of the shorter edition of the year. I feel that summer is coming and less articles are written. While on LinkedIn posts are still flourishing with unequal quality. Sadly, I miss good ol' web.

While read this edition listen the Spotify data engineering playlist done by Barr Moses. 🎶 EVERYTHING IS BROKEN.

Data Fundraising 💰

  • Whaly raised $1.9m seed to provide a all-in-one BI tool. The YC company offers a way to sync data from dozens of sources directly in your warehouse and add on top of this a visual way to transform your data to plug it in their Report Builder. They approach the modern data stack from a BI perspective providing all the tools needed in one platform.
  • IBM acquires Databand. Is it already the time for the consolidation in the data observability space? As mentioned by IBM this is the fifth acquisition since the beginning of the year. It will be interesting to follow how Databand will evolve while in contact with IBM customers.

How to make great schemas

In data engineering we often do schema to present architectures, projects or stuff. Information visualisation is the best way to simplify the complex world we live in. Benoit described what to do to make great schemas. From using a paper — yes, you know the white rectangle you may have somewhere in a drawer — and a pencil to digital tools to do it.

To be honest I hate Benoit right know because I deeply want the 350$ e-ink tablet he's using to draw.

While speaking of schema, he also featured in his monthly newsletter a great way to visualize SQL joins. This is way better than the tradionnal one with circles.

2 technical deep dives that will make you dizzy

Uber uses Spark at a level not a lot of companies have ever imagined. Which means they shuffle a lot. The shuffle is the operation that happens every time you transfer data between job stages. So they decided to develop a Remote Shuffle Service that handle all shuffles efficiently. This is a crazy deep technical post.

Canva is a platform to create graphic design online. Which means they have a lot of visual content. Which means they need GPU if they want to apply machine learning to their content. They developed an awesome encapsulation of their applications combining Docker, Kubernetes and Nix for ML. This is a crazy deep technical post.

Me after reading the 2 previous articles (credits, cropped)

THE CLOUD AND DATA WAREHOUSE - ARE THEY COMPATIBLE?

First, you don't need to yell at me. Second, this is a good question I ask myself every time I wake up. Thankfully Bill Inmon — one of the 2 popes of the data warehouse — had also this question in mind 2 days ago. To him the cloud is not totally compatible with data warehouses mainly because of data movement which is a big cost in cloud environment.

Data platforms future

Speaking of the cloud costs, this week Kris tried to wrote thoughts on data platforms costs driven by underlying cloud costs and how it will be hard to keep up for companies. The pay-as-you-go has some limits.

On the other side Alexandre finished his 3 posts series about Data Platforms: Past, Present, Future. In a well written Medium piece he's trying to guess where are we going and what will be the mutations the data field will face.

As a side note in the 3rd mutation he's mentioning the Data Mesh but Gartner hype cycle is already considering the concept obsolete. What a fun world.

Product News 🎚

This is category I've sometimes in mind but I melt it in the Fast News. Here I want to try to split it.

  • Preset announced their dbt integration. This is interesting to see, as Preset is a BI tool (the cloud offering of Apache Superset) they decided to develop a deep integration working in both direction with dbt. Preset is able to read sources, models and metrics from dbt, and dbt can access to dashboards in order to fill exposures. This is something Preset developed on their end with their CLI, but still paves the way for other tools.
  • Discover dolt. Dolt is another SQL database, but with a key differentiator: you can manage your data with git-like commands. All the commands you know for Git work exactly the same for Dolt. I want to try dolt cherry-pick.

Legends (credits)

Fast News ⚡️

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.16

Data News #24.16 — Llama the Third, Mistral probable $5B valuation, structured Gen AI, principal engineers, big data scale to count billions and benchmarks.

Members Public

Data News — Week 24.15

Data News #24.15 — MDSFest quick recap, LLM news, Airbnb Chronon, AST, Beam YAML, WAP and more.