Skip to content

Data News — Week 22

Have a glimpse of this week newsletter: Airflow EOL, Data fundraising, Data Engineers Skills needs, Meltano or Airbyte?

Christophe Blefari
Christophe Blefari
2 min read
Me every week starting fresh data news (credits)

Wherever you are 🎈 I wish you a happy Friday! Before starting the news I want to thank people that subscribed to the newsletter. I hope you find the content relevant and do not hesitate to drop me a mail with your feedback if you would like to see something specific.

Firstly all Airflow users, I don't know if you saw the news but Airflow 1.10.x support is approaching is end-of-life on 17 June 2021, if you need to upgrade this documentation could help you.

A new category in the newsletter: data fundraisings 💰

I would like to emphases each week all data fundraising (or buying) that happened recently. This week we had 3 news in that space:

  • Stemma (Amundsen as a service)  raised $4.8M. Stemma is a data catalogue and has been founded by Mark Grover, one of the creator of Amundsen at Lyft. For the moment the company has 5 five employees.
  • Alation raised $110M, Alation is another data catalogue company with a lot of connectors and was founded in 2012, that contrast with the new competitors in the data catalogue landscape. To be noted Snowflake ventures invested in Alation.
  • Cloudera becoming private for $5.3B, in the Modern Data Platform Cloudera tools became less mandatory, we'll watch this to understand how Cloudera will position in the marker.
  • To contrast with Cloudera move to private, Confluent (Kafka) filled their S-1 form for IPO.

Data engineer skills

This week two articles are related to the data engineer skills. The first one is the One skill every data engineer needs, I found this article very true and as I don't want to spoil I recommend everyone to read it.

In addition of base skill for every data engineer, Data Engineers shouldn't write Airflow dags. This is a strong opinion but I think this is true, Data Engineers should do something close to software engineering work and build frameworks in order to generate DAGs and give control to other (data) people.

🐼 Alan data team

Arnaud Buisson from Alan, a french healthcare insurance company, wrote an article about his first 5 months in the data team. He writes about the culture of ownership and no-meetings rule that is present at Alan. Worth reading!

To go further (if you missed it) Alan also wrote about their CNIL control (CNIL = independent French administrative to ensure data privacy laws).

Meltano or Airbyte to extract and load data?

In the data integration space there are more and more tools. Preset team wrote a blog post comparing Meltano (OS data platform by Gitlab) and Airbyte (which raised $26M last week) on their data integration capabilities as a whole platform.

Understand Kafka as you were the creator

On Toward Data Science Felipe wrote the second part of the article explaining Kafka as if you designed it. As Kafka is becoming widespreadly used in all company but as the system is quite complex this is a must read to understand how it works.

💻 ML Friday — How ContentSquare used SageMaker to reduce TensorFlow latency

A branded article on how ContentSquare data scientists used SageMaker to reduce TensorFlow inference latency in order to analyze HTML documents. They moved from a Flask based deployment to SageMaker.

Last but not least

Airbnb wrote the part 2 about Minerva, their metrics system. And a feedback on streaming 25B daily records in BigQuery.

If you came until here, first THANK YOU, then do not hesitate to subscribe (bottom right of the page) and/or follow me on Twitch (mainly in French but if you pass by I'll speak in English!).

datanews

Christophe Blefari

Data Engineering Coach that enjoys all kind of data platform.