Wherever you are 🎈 I wish you a happy Friday! Before starting the news I want to thank people that subscribed to the newsletter. I hope you find the content relevant and do not hesitate to drop me a mail with your feedback if you would like to see something specific.
Firstly all Airflow users, I don't know if you saw the news but Airflow 1.10.x support is approaching is end-of-life on 17 June 2021, if you need to upgrade this documentation could help you.
A new category in the newsletter: data fundraisings 💰
I would like to emphases each week all data fundraising (or buying) that happened recently. This week we had 3 news in that space:
- Stemma (Amundsen as a service) raised $4.8M. Stemma is a data catalogue and has been founded by Mark Grover, one of the creator of Amundsen at Lyft. For the moment the company has 5 five employees.
- Alation raised $110M, Alation is another data catalogue company with a lot of connectors and was founded in 2012, that contrast with the new competitors in the data catalogue landscape. To be noted Snowflake ventures invested in Alation.
- Cloudera becoming private for $5.3B, in the Modern Data Platform Cloudera tools became less mandatory, we'll watch this to understand how Cloudera will position in the marker.
- To contrast with Cloudera move to private, Confluent (Kafka) filled their S-1 form for IPO.
Data engineer skills
This week two articles are related to the data engineer skills. The first one is the One skill every data engineer needs, I found this article very true and as I don't want to spoil I recommend everyone to read it.
In addition of base skill for every data engineer, Data Engineers shouldn't write Airflow dags. This is a strong opinion but I think this is true, Data Engineers should do something close to software engineering work and build frameworks in order to generate DAGs and give control to other (data) people.
🐼 Alan data team
Arnaud Buisson from Alan, a french healthcare insurance company, wrote an article about his first 5 months in the data team. He writes about the culture of ownership and no-meetings rule that is present at Alan. Worth reading!
To go further (if you missed it) Alan also wrote about their CNIL control (CNIL = independent French administrative to ensure data privacy laws).
Meltano or Airbyte to extract and load data?
In the data integration space there are more and more tools. Preset team wrote a blog post comparing Meltano (OS data platform by Gitlab) and Airbyte (which raised $26M last week) on their data integration capabilities as a whole platform.
Understand Kafka as you were the creator
On Toward Data Science Felipe wrote the second part of the article explaining Kafka as if you designed it. As Kafka is becoming widespreadly used in all company but as the system is quite complex this is a must read to understand how it works.
💻 ML Friday — How ContentSquare used SageMaker to reduce TensorFlow latency
A branded article on how ContentSquare data scientists used SageMaker to reduce TensorFlow inference latency in order to analyze HTML documents. They moved from a Flask based deployment to SageMaker.
Last but not least
If you came until here, first THANK YOU, then do not hesitate to subscribe (bottom right of the page) and/or follow me on Twitch (mainly in French but if you pass by I'll speak in English!).
Join the newsletter to receive the latest updates in your inbox.