Skip to content

Data News — Week 23.22

Data News #23.22 — Japan views on copyright for AI, a new AI camera, what's the hype behind DuckDB?.

Christophe Blefari
Christophe Blefari
3 min read
Sun is coming in Berlin (credits)

Hey, I've been sick longer than I expected, but I'm finally well. I hope this email finds you all well, as well. I've had to catch up on almost 3 weeks of content. When I step back, the amount of articles shared each week is insane, there are countless articles about things that have already been written. Sometimes I feel like I'm trying to find a needle in a stack. Or several needles.

I wanted to write more about Microsoft Fabric and the states of data that were published last week but I'll do it another time.

Gen AI 🤖

As always the pace of innovation in this field is incredibly fast so here a few news I've seen I found worth it:

  • Japan goes all in: copyright doesn’t apply to AI training — I'm far from being a law expert but it looks like something that will create precedence. The article is saying that it lays down with Japanese new strategy to become a leader in AI technologies, by removing barriers on training data they hope to open doors. Obviously artists (especially mangakas) were not happy about it.
  • Sam Altman, OpenAI CEO did an Europe Tour — Sam went to Europe recently (Span, France, Poland, Germany and UL) in order to meet countries representatives. I guess that he did lobbying around the AI Act but also he was here to do real estate because OpenAI wants an European office.
  • New Nvidia 144TB GPU — Nvidia is the clear winning of the AI race. They announced an insanely crazy new GPU and Google, Meta and Microsoft are already customers. Surprising.
  • How DoorDash uses XcodeGen to eliminate project merge conflicts — Ok now I don't want to resolve a Git conflit anymore 😅 .
  • US researchers developed a LLM-powered Minecraft agent: Voyager. Minecraft is a survival game and the agent has been designed to Minecraft learn life skills incrementally. In the end it generates a code that is used to send the agent in the cubic world.
  • A new kind of camera— An artist developed an AI camera, the Paragraphica, that is a context-to-image camera. The camera is using location data to feed context to a generative algorithm.
A dynamic prompt — (Paragraphica camera)

Fast News ⚡️

  • Meltano announced their Cloud — Meltano is an open-source data integration project that has been started at Gitlab. With a few configuration and a CLI you can write data pipelines using hundreds of connectors (using Singer spec). The pricing is based on the number of runs and not the volume of data. This is a major difference with the competition (Airbyte, Fivetran, Stitch).
  • A ridesharing app simulation — Juraj developed over the last months a complete simulation of a ridesharing app (like Uber), he shared everything he did in blog posts and the results is kinda amazing. I recently spent hours on Mini Motorways so this is the kind of side projects I like.
  • Breaking into data engineering as a self-taught developer — A few advice from a fellow data engineer who was data analyst before.
  • What's the hype behind DuckDB? — This is a great post from Matt Palmer about DuckDB. If you want a quick intro about the tool this is the way to start. In the article Matt also showcases how you could use DuckDB to write a transfer pipeline like moving a Parquet file from a disk to S3.
  • How Instacart Ads modularized data pipelines with Spark — A great deep dive on a Lakehouse architecture for streaming. The article describes a migration from "thousands of complex SQL lines" to composable Spark SQL.
  • dbt at Zendesk ; setting foundations for scalability.

Data Economy 💰

  • Databricks acquires bit.io — bit.io was "the fastest way to get a Postgres database". In order to start you just had to send data and your database was already setup. When looking at the press release Databricks acquisition is a team acquisition to improve their own developper experience.

Now I go back on Diablo — See you next week ❤️.

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.16

Data News #24.16 — Llama the Third, Mistral probable $5B valuation, structured Gen AI, principal engineers, big data scale to count billions and benchmarks.

Members Public

Data News — Week 24.15

Data News #24.15 — MDSFest quick recap, LLM news, Airbnb Chronon, AST, Beam YAML, WAP and more.