Skip to content

Data News — Week 23.19

Data News #23.19 — Minds of data my new podcast, Google I/O takeaways, HuggingFace releases, Salesforce GPT and the Fast News ⚡️.

Christophe Blefari
Christophe Blefari
6 min read
Sorting the news (credits)

Hey you, new Friday means Data News. This week is pretty stacked in term of content, especially video / audio content. I hope you will enjoy it as much as me.

Let's start with with my newly created podcast Minds of Data. In Minds of Data I'll met people from the data ecosystem in order to learn more about them. In the first episode I sad down with Joe Reis and we discussed about his professional journey before becoming the thought leader he is today, we also chatted about data engineering. You can listen the episode on Spotify, Apple Podcast and Deezer.

PS: this is my first episode ever so feedbacks are more than welcome.

As the same time in Paris we organised last Tuesday the May Airflow meetup. We had 3 talks, that you can find on YouTube. I really liked Benoit and Samy presentation about Cloud Composer—Managed Airflow on GCP. They shared good practices on how to manage Composer in the cloud, things like:

  • Use the same configuration for staging and prod
  • Use a secret manager to manage your Airflow connections
  • Use IAM restrictions in the DAGs bucket
  • Use operators and define the company policy around it
  • Define clear policies to govern your Airflow

Also Airflow 2.6 went out this week with a new trigger DAG parameterizable UI, new alert notifications framework (callbacks) and a new graph interface in the grid view.

Gen AI 🤖

The pace of innovation and announcement in the (Gen) AI field doesn't deflate. I can't really cover the whole field because it moves so fast that I can't even keep up. This week the Google I/O Keynote was a major milestone.

Google I/O Keynote takeaways

What amazed me from the Google Keynote is the fact that Generative AI is treated like a product, like the 2007 iPhone—look at this ad. When you think about it AI has always been something hidden, like an API call, a score or a recommendation in a larger UI. In Google's Keynote AI gets a 26 minutes segment and then all the derivations lasting for 2h.

Bold tagline & Google ego speaking (screenshot from the Keynote)

To me Google annual conference is a sign that the party is over, especially for OpenAI. Actually OpenAI deal with Microsoft was probably the best deal they could have go for. Even if as human we want to send models in the arena to get the most performant one, or masturbate ourselves comparing the size of parameters. In the end the best integrated models will win. And Google as a head start—as well as Microsoft, as they remind us in the Keynote they have 15 products used by billions of people: they have our e-mails, our photos, our maps and more. AI is a just a feature in their product, even if it needs an UI rethink, this is just a feature.

So in the end Google, an AI-first company from the beginning wants to put AI everywhere and wants to offer you an AI collaborator. Here are the major takeaways from the Keynote:

  • They release PaLM 2, the last foundation model. It will exists in 4 sizes: Gecko, Otter, Bison and Unicorn each asking for different hardware resources to work.
  • PaLM 2 will be natively integrated in Google products. Gmail will get enhance smart reply features, Maps will propose immersive view over a route and Photos will have a magic editor that will allow you in a single drag-n-drop to edit a picture.
  • Google will create a sidekick that will be available in Workspace—Sheets, Docs and Slides—called Duet AI, you'll be able to ask the AI to create content for you unlocking productivity gains. Duet AI will also work in GCP (in the console and within the web IDE).
  • According to the announcement PaLM 2 will particularly shine when fine-tuned (e.g. for IT security or medicine). You'll be able to do it by yourself within your own GCP instance in Vertex AI. They also released Imagen, Codey and Chirp resp. for image generation, code generation and speech-to-text.
  • Bard, the conversational model—ChatGPT equivalent—is now opened to everyone (actually not in all countries). Bard works great for code generation, debugging and code explainability.
  • Bard might also be the Zero-ETL solution we were all waiting for. In the demo the speaker asks Bard to find schools in an area, then asks for it to be saved in a Google Sheets, then asks to for a new column in the sheet if the school is public or private. To be honest, what prevents Bard in the future to do the same in a database?
  • Finally Google tease their next-gen model Gemini which obviously will be awesome, to hear them and announce an evolution of the search interface will Gen AI as a new interactive way to search.

In the end I really like the keynote because it gives a new milestone about what we can expect as integration in the products we daily use.

Other stuff

  • Hugging Face released an open model called StarCoder that has been trained on Github code that is meant to act as a Copilot. Still the model is not yet ready to be used as an instruction model—ChatGPT way.
  • At the same time HF also introduced an open-source Chat UI.
  • After Bill Gates, it Steve Wozniak—Apple co-founder—who gives his take on the AI breakthroughs in a BBC interview mainly we can't stop the march of progress, AI will be used to scam people and we have still to put guardrails, but human guardrails.
  • Salesforce do not want to be leftover in the battle, they announced Slack GPT natively integrated in Slack to summarise or compose messages but also a way for partners to bring new kind of Gen AI apps.
  • Also Salesforce did a makeup to Tableau with Tableau GPT, a way to provide AI-powered analytics. In Tableau Pulse you'll have access to auto-generated insights on your data. With a "For You" tab like you were in TikTok.
The StarCoder (credits)

Fast News ⚡️

👋
The newsletter is much longer than expected—I got lost today in watching fascinating videos—so I'll be sending out a second part over the weekend or early next week with a recap of the best talks from Data Council 2023. If you want to get a head start, my favourite talk was Lloyd's demonstration of Malloy, an experimental langage for data.

See you in a few days with Data Council takeaways ❤️.

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 1200 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.08

Data News #24.08 — Presentation about Engines leading to DuckDB, Gemma and Gemini, Mistral Next, MDS follow-up and more.

Members Public

Data News — Week 24.07

Data News #24.07 — OpenAI Sora, Gemini, boximator, models competition is fierce, new Observable and BI as Code and more stuff.