Skip to content

Data News — Week 23.38 (late)

Data News #23.38 — Usual data news with Microsoft Copilot, DALL·E 3, Postgres 16, the fast news and a lot of money spent.

Christophe Blefari
Christophe Blefari
3 min read
pair of blue-and-white Adidas running shoes
Early like my run (credits)

Hey. This is a super late Data News, I wanted to send it earlier but I was travelling then enjoying time with friends and family. I'm still struggling a bit to write as fast as I would like, but 🤷‍♂️.

So, sorry for the late edition and enjoy.

Gen AI 🤖

  • Announcing Microsoft Copilot — Having everything under a common brand is great and Copilot is a great name. Microsoft announced that your AI companion called Copilot will be everywhere in the next Windows 11 update. For instance in Paint, Photos and in your web search (Edge and Bing).
  • At the same time Microsoft leaked 38To of data — through a Github repository containing a link to an Azure storage with public access open.
  • OpenAI announced DALL·E 3 — natively built with ChatGPT to create more impressive image from user prompts.
  • I recommend you to follow Oliver on LinkedIn if you don't want to miss anything related to Gen AI. He's writes the best takeaways multiples times a week.

Fast News ⚡️

  • Postgres 16 has been released — featuring a few performance improvements in parallel executions (string_agg and array_agg) but also with SELECT DISTINCT and COPY command.
  • Astronomer released Ask Astro — A LLM application that is able to understand Astro docs to answer most of the Apache Airflow questions. The source code is on Github.
  • The implications of scaling Airflow — Sarah, who's working at Prefect, wrote a post about Airflow downsides at scale and how Prefect mitigates them. I'd not say that all the downsides are relevant blockers but still it outlines on of the biggest Airflow issue: everything is implicit. Airflow is a framework allowing a wide range of code easily leading to debt.
  • dbt pattern, test-transform-publish — Often called staging pattern. The idea is to publish the data once tests have validated that the is valid. What Leo proposes is an incremental transformation with tests on top. If the tests are valid then an view runs and select the last update.
  • A guide to the Snowflake results cache — Cache is a critical piece to every data warehouse either for reusing data between runs or between stages in the same run. This article details what you have to understand to optimise your Snowflake query writing.
  • Use the new SQL commands MERGE and QUALIFY in Redshift — Redshift still exists and tries to catches with the competition. Merge allows you to deduplicate data by writing what you want to keep when rows matches and qualify filters results of a previously computed window function.
  • Real-time analytics with Snowflake dynamic tables & Redpanda — A good showcase of Snowflake dynamic tables with Wikipedia data.

Data Economy 💰

  • Cisco acquired Splunk for $28b in cash. Crazy amount. Splunk has been here for a while providing a all-in-one platform for tech observability by ingesting logs and events to provide insights on a tech stack.
  • Secoda raises a $14m Series A. Secoda is a data catalog tool with lineage and monitoring capabilities. Fresh money will help them to add AI capabilities to the product and increasing monitoring capabilities.
  • Motherduck raises $52.5m Series B. In total they raised $100m and announced that Motherduck product is open for everyone and not anymore behind a waitlist. Mainly Motherduck is the company providing DuckDB as a Cloud product but they are not developing DuckDB, their product is quite young but works like expected: with a simple string you can get an analytical cloud database that just works and that can be instantly replaced by a local one if needed.
  • Tabular raised $26m Series B. Tabular is the company providing a cloud platform on top of Apache Iceberg—developed by Iceberg founders. I'd say that Iceberg (or table formats) are probably one of the technology that will incrementally change for the better the way we write data pipelines. Providing more control over data storage. Yet I think Iceberg is not yet ready to be widely used (Python write support still missing, you need Spark).
  • Anthropic could get $4b from Amazon. Amazon did a first $1.3b in a corporate round to bring a lot of money to one of the biggest OpenAI. The ChatGPT alternative, Claude, is already out there.

See you on Friday ✨.

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.16

Data News #24.16 — Llama the Third, Mistral probable $5B valuation, structured Gen AI, principal engineers, big data scale to count billions and benchmarks.

Members Public

Data News — Week 24.15

Data News #24.15 — MDSFest quick recap, LLM news, Airbnb Chronon, AST, Beam YAML, WAP and more.