Painted ladies in SF (credits)

Hey here. What's up? While you're all data vibing I'm sliding into your inbox with the fresh Data News of the last month.

I have moved to San Francisco for the next 3 months, so if you're in town and wanna talk data or go for a run, you know where to find me. It's been a week since we arrive with nao Labs team in SF and it has been a blast. We will be at the Data Council pitching at the AI Launchpad on the 22nd.

I'm planning to create content for you to follow the Data Council from the inside as it has always been great to write takeaways about the talks these last year (2023 and 2024).

AI News 🤖

4 lama (credits)

If you've been on the internet lately you should have seen the massive MCP hype, everyone is either building a MCP server or a MCP registry or even a registry of registries.

But what's a MCP?

MCP means Model Context Protocol and is an open protocol created by folks at Anthropic. A MCP is a most of the time referenced as a server that encapsulate discoverable tools, prompts and data to be used by an LLM. MCP clients are on the LLM side and make requests to MCP server.

For instance there are a few Snowflake MCP servers, if you add them to Claude, you will be able to query Snowflake from a Claude prompt for instance, or get a table metadata.

Fast News ⚡️

A rare Iceberg table in real life (credits)

Over the last month a lot of things happened also in the data engineering space, especially around Iceberg which is taking over a lot of discussion when it comes to data storage.

Why is Iceberg so important right now?

Iceberg is a way to escape the data warehouses to build your own warehouses in kit on-top of bucket storages. Iceberg being open-source it will allow us to build interoperability between all systems while supporting some kind of transactional systems on-top of Parquet files.

Because we have to unify all the trends there is a Iceberg MCP server that has been developed.

Examples and thoughts

Just to go further and connect everything, a few post about the relathionship with Iceberg and the lakehouse and where all this fuzz is going, and what it could mean for your actual data stack.

My two cents about this: this is mainly experimental and this is not relevant yet for the scale most of the companies are. Warehouse + native tables is the easiest user experience you can find, and as data engineers what we want it users using our platforms, right?

Data Economy 💰


See you soon ❤️