Hey you, I hope you had a great week. On my side I'm slowly starting to get on top of the things I had in queue. But, sadly, I work in LIFO so I feel that I'm never done. For people that are not use to it it means last in, first out. Which means that I get easily disturbed by a notification—or even a thought—and do something that I did not plan to do at first. It, probably, explains why you always get the newsletter late on Fridays—or Saturdays.
Thank you for the feedback about last week issue, it seems you liked it. I'll try to continue doing deep-dives on article from time to time.
Airflow alternatives meetup
We are organising next week with the Paris Apache Airflow Meetup group an online event to discuss about Airflow alternatives. At every Airflow meetup we often get questions about Airflow competition so we decided to give a voice to alternatives in order to understand how they compare with Airflow and more.
The first even will take place next week, on March 21st at 7PM CET (UTC+1) and we invited Mage and Kestra. We will host another event soon after with others. You can either register on LinkedIn either join the meetup event.
How lucky you are because I will host the event, so you'll hear my awesome French accent. It also means that if you have any questions that you want me to ask you can send them to me beforehand .
Gen AI 🤖
I will create a specific category for generative AI.
If you live in a cave or if you only read my newsletter to get news about the data world you might have missed that GPT-4 has been announced and released this week. I even had hard time navigating between data engineering memes and GPT4 tips on LinkedIn and my Twitter is divided between GPT-4 threads and protests in France. What a time to be alive. Politicians think we should work longer when we are slowly starting to discover new AI capabilities that will for sure impact workplaces.
I don't want to take the usual shortcut—but how could I not do that. Will AI replace jobs? I do think that AI should empower people, but will the capitalism think like this when an API call will be able to do the same job as a human? Does even capitalism think? Actually it's probably human decisions about AI that will lead to AI replacing people.
One field that has been totally impacted by the generative field is the Natural Langage Processing (NLP). On Reddit someone asked this if others were also witnessing panic in NLP orgs. The general feeling is that GPT made years of NLP research outdated.
A few other news:
- LinkedIn team also wrote a blog about AI principles stating that AI is like oxygen for the engineering team—I personally would have say that data was oxygen, but who cares—and that with great power comes great responsibility. The same week Microsoft (who owns LinkedIn) reportedly layed off the AI ethics and society teams. Great timing.
- Glaze, protecting artists from style mimicry — A tool developed by researchers at Chicago university will help digital artists by cloaking their art to avoid mimicry from deep learning trainings.
- Google and Microsoft will compete to include AI copilots in their offices suites — Microsoft announced 365 Copilot that will work in Word, Excel, Powerpoint and Outlook. On the other side Google announced the same for Google Docs and Gmail.
Fast News ⚡️
- Migrating from role to attribute-based access control — RBAC is probably one of the most use paradigm when it comes to autorisation especially because role based autorisations are faster to put in place. In the article Grab team explain how to migrated from roles to attributes autorisation on Kafka.
- Speeding up “Reverse ETL” — Ziqi works at Microsoft and details in this article what they had to consider to improve their Lakehouse exports to downstream databases. In short they switch SQL Server to columnar storage, disable indexes and locks when copying and played with parallelisation and batch size.
- Online gradient descent written in SQL — Max is one of the best when it comes to do great experiments. This time it shows that everything can be done in SQL. With recursive CTEs he implemented sklearn linear model and the code is not even that big.
- Data with Rust — This is a handbook that will showcase how to work in data engineering with Rust. At the moment only part 1 and 2 are written but it looks promising.
- Sharing data between dbt projects, dbt exposures to sources — When you have multiple dbt projects it can be a mess to reference a model from another project. This blog shows how you can automate it with a CI and definitions in exposures.
- Polars vs pandas : A new era for Python DataFrames — This comparison is also slowly starting to be a great debate in the data world. Will Polars overtake pandas in the coming years? Guillaume wrote yet another great comparison.
- Tracking the fake GitHub star black market with Dagster, dbt and BigQuery — Things are getting spicy here. Dagster team proposed a way to eventually identify Github projects buying stars.
Other few articles but with no comment:
- Introducing multi-modal index for the Lakehouse in Apache Hudi.
- How to be a good data analyst without good data.
- dbt Reimagined.
Data Economy 💰
- The Austrian data protection authority has decided that Meta tracking tools are in violation of the GDPR. It will create a precedent.
- Seldon raises $20m Series B. Seldon is a MLOps platform that helps you deploying models in production. At core Seldon provides a framework that you can configure to serve you models on top of Kubenertes.
- 👀 Adept raises $350m Series B. This is again a testimony of the frenzy about generative AI, and according to me the most impressive one. Adept want to create a general purpose AI teammate for everyone. At the moment it takes the form of a browser extension in which you can ask stuff when you navigate on Salesforce, Google Sheet or Craiglist.
- Cast AI raises $20m in funding. They propose an AI to cut your Kubernetes costs in half. Bold promise.
See you next week ❤️.
Join the newsletter to receive the latest updates in your inbox.