Data News — Week 23.01
Data News #23.01 — First edition of the year (late to start the year on the right foot), 2022 throwback, data team role, data science, fast news and lay-offs.
Happy new year 🎆. For those who were already subscribed at the start of last year I tried to put resolutions and objectives for the year that I did not succeed to follow. The year was so different to what I was expected. Maybe this is an excuse. Anyway I did not reach my goals. What about if we don't care for this year?
Still, what happened was awesome and here a small personal / professional throwback:
- I worked for the French public sector as a freelancer: tax administration and education ministry. It makes sense for me and this is something I also really care about.
- Bootstrapped a coaching activity with companies and individuals—this is a new exercise but I feel it's close to management that I can't do in freelance.
- I moved to Berlin, talk to my first meetup ever in English, met awesome people there but I'd like to met more.
- We restarted the Paris Airflow Meetup and people liked it. There are still a few seats left for next Tuesday meetup.
- I started to pay myself after 1 year and half of unemployment pay. This is maybe my main source of stress. Will I be next year able to find missions to pay me for the whole year? My business plan asks for 100k€ in revenue.
- This year I deeply learned Superset—adding the tools to my tools expertise list.
- My written content got around 100k views last year. The blog crossed the 2000 members mark (❤️) and I won the best data science newsletter award. On LinkedIn and Twitter I multiplied 2 my followers. Everywhere I was starting from the bottom and now we're here.
- I talked in Robin's podcast about the newsletter and my data engineering journey.
I'm also sorry to start the year late with my newsletter sending. On the last 3 days I was teaching DataOps at a French school and I did not manage to find the time to write to you. And you know what, this is the first time in 7 years of teaching that more than 80% of the class wants to become data engineer.
As a conclusion of this introduction, I want to thank everyone reading this newsletter and sharing feedback or good words about it. It means so much to me and it fuels me. For sure the Data News will be here for a new year and new stuff is coming.
Time for the news—I have around 30 links to share today so it might be less opinionated than usual. Happy reading.
Data team role
I really like all the thoughts around data team role, missions, vision and strategy. I still think that we did not reach any form of consensus about data teams. In term of tooling the modern data stack proposed something that works but the modern data team is still behind. Here the latest ideas I've seen this week:
- Should software teams start learning from analytics engineers? — Petr reverse the common idea where analytics teams should learn from software. Why actually everyone is just a part of engineering that helps all of us getting better at data and software.
- Data Teams as support teams — Chad from Zendesk thinks that data teams are often misaligned with customers and because of the supportive nature of the relationship between something does not work. He then digs in modeling and analytics value to understand what are the impact on the relationship—this is fun to read that someone from Zendesk does not want to be a support team!
- ❤️ Elbows of data — This is a good follow-up to Chad's post. Katie toss the term elbow of data who are "folks who have insisted on being involved in driving the company forward, whether they were invited to or not". When we do data we have skills and understanding to help our company. Once again our main role should be to empower stakeholders.
Data Science Saturday 🤖
- How to invest better in acquisition channels? — Marianne detailed how data science helped Qonto understanding their acquisition channels investment.
- Data science has a tool obsession.
- Selecting the best image for each merchant using exploration and ml.
- Introduction to Graph Machine Learning (related Grab Graph service platform).
We are in a middle of ChatGPT frenzy. A new day means a new interrogation about our future. Our future as developers but our future as humans. ChatGPT is seeking for money at high valuation amount. Still, should we trust OpenAI to be open as the name is saying ?
If you want to understand better what's behind ChatGPT you can have a look at minGPT a minimal re-implementation via PyTorch.
At the same time it seems that following initial Microsoft investment in OpenAI, Bing will use GPT models to improve their text and images search. Who would have say that Bing would kill Google?
Final note: How China is building a parallel generative AI universe.
Fast News ⚡️
- Why I'm using (Neo)vim as a Data Engineer and Writer in 2023 — If you want to take 2023 beginning as a sign to move to vim Simon wrote a great post for you.
- CircleCI’s unnoticed holiday security breach — CircleCI had a security breach a few days ago.
- What if we rewrite everything? — Navigating through the technical debt and spending our entire career doing the same stuff over again. What is the right strategy to have? Probably Keep It Simple Stupid.
- Why It’s So Hard to Become a Staff Engineer — A feedback to help people bringing the gap between senior and staff. I think this is even relevant to data world.
- Introducing ADBC: Database Access for Apache Arrow — When I see "minimal-overhead alternative to JDBC/ODBC for analytical applications" I'm instantly in. My all professional life I've heard architect saying JDBC is bad so if something better can come so we don't talk about it. You can also listen a related podcast about Arrow vision.
- Recap: a data catalog for people who hate data catalogs — This one hurts. You may have noticed if you read me that I'm not very tender with current state of data catalogs. This week Chris started a small footprint data catalog written in Python called Recap. I'll have a look at it soon.
- Observability, Tick — Nigel wrote a small post detailing how a smal startup can do observability without spending a lot of money.
Data Economy 💰
The economic situation is obviously not at his best. Previously data was not always impacted about the difficulties but it's also coming to the data world. That's why data fundraising will become a data economy wrap-up.
- Astronomer laid off 20% of their staff—which represents 76 folks—and moved from a co-CEO structure to only one CEO. I appreciate the transparency effort that has been done to make this note public. But I still struggle to see Astronomer value and strategy, but this is hard because Astro hires a lot of core Airflow contributors and have important contributions to the data community.
- Salesforce is laying off 10% of their staff—roughly 8000 people—including folks at Tableau. They acquired Tableau in 2019 and analysts are saying that Tableau ex-employees are more often impacted by the lay-off.
In search of consolidation and levers to do companies are also merging:
- Qlik wants to acquire Talend. Qlik and Talend are two old BI giants. The first one has been founded in 1993 and the second one in 2005. They had obviously been challenged by the cloud vendors and the modern data stack vision that does not include them.
- Confluent signed a deal to acquired Immerok. They are respectively the home companies of Kafka and Flink. This is to be honest a natural move because the two technologies works at the best together and ksqlDB never took the place it should have been in the market. Sadly also right now they are challenged by real-time tooling that is way easier to setup.
Finally a fundraise:
- Chaos Genius is raising $3.3m Seed Round. They propose an optimisation platform for Snowflake to help you save up to 30% of your warehouse costs.
See you next week ❤️.
Join the newsletter to receive the latest updates in your inbox.