Skip to content

Data News — Week 30

Data News #30 — SAS going public, 👩 Data Engineer, Scale data, dysfunctions of Data Engineering and run Airflow in one command

Christophe Blefari
Christophe Blefari
4 min read
Me since I started the weekly digest (credits)

Hi it's me again 👋, I hope this new digest finds you good. Maybe you are in holidays, maybe not, still I'm here to give you your weekly glimpse of the data ecosystem.

I got many feedback telling me you enjoyed Airflow Summit takeaways from last week so in the future I'll continue to cover these kind of large conferences for you! Thanks a lot ❤️.

Data fundraising 💰

  • Yesterday we got news that SAS, the analytical software company started in 1966 (yep you read it, 1966), will go public in 2024. The goal of this change is to provide stock options to employees in order to attract more talents.
  • Kili Technology, a Paris based AI startup, announced their $25m Series A funding. Today the company report about 40 employees on LinkedIn and developed a end-to-end AI training platform with the end goal to have better AI thanks to better data.

Is it Hard Being a Woman Data Engineer?

I've voluntarily chosen to start the Data News with this article because the gender equality in data engineering is obviously not yet here. I hope that in the future voices like Sarah Krasnik one will continue to emerge to help reach this diversity. I'll let you read the whole article to have the answer.

Scale data teams and platforms

Everyone knows, data is coming. The future we'll all face in our respective companies will be to manage more and more data everyday. This lead to a major scaling issue for everyone. What works for 10 will probably not work perfectly for 100 or 1000.

This means you'll probably need to migrate from a system to another while still maintaining the first one. So, this is a good question, how do you modernize while keeping the lights on? On the other hand you will need to scale teams, this article will provide you keys to build your data dream team.

To illustrate data migration I give you this article from LinkedIn engineering about their largest data migration task for the recruiter and jobs products.

Data teams/platforms you need to water to make it scale or works (credits)

The dysfunctions of Data Engineering

This is my evergreen content, I think that all data teams today are still dysfunctional. I've already shared a news (#27) in the past.

This week we have an amazing article on the topic. MrTrustworthy shares with us what are the dysfunctions of data engineering. You'll find a lot of founding concepts in the article: type of data engineers, inversion of responsibility, driving data rather than data-driven.

Use Amazon Location from Redshift

Yeah, now it comes some technical articles. On the AWS blog you can get an example on how you can access Amazon Location service from Redshift in SQL using some UDFs and Lamdba. As a side note, this article reveals how much AWS is technical and how all solutions look complex.

This example is about Redshift but I think it's easily feasible with other warehouses.

Understand concepts by comparison

Sometimes by comparing tools or concepts it's easier to understand. So this week I propose you 3 posts:

Quality: how to write good batch pipelines or good code comments?

Zack Wilson (a data engineering LinkedIn voice) has started writing on medium. The first topic he wrote on is how to evaluate the quality of a batch data pipeline. The third point he mentions about Evaluating the maintenance burden is the one I prefer.

On Stack Overflow blog they give you 9 rules for writing good comments in your code. My favorite rule is the third one (again).

Run Airflow in one command

Every Airflow users know that it's hard to make airflow running in one command. Airflow needs a bit of setup in order to run: env variables, scheduler & webserver, database, etc. This article gives you an example on how you can write a single CLI command to run a production-like instance locally.

If this is already too much for you, you can also check Meltano that aims to simplify airflow startup.

People learning new data concepts on the beach (credits)

ML Friday — MLOps & summer upskill

This week I share with you two articles that have been written on KDnuggets. MLOps best practices and a program to upskill in machine learning with a lot of resources in 4 weeks.

Thanks for reading. Do not hesitate to subscribe. I've also started a YouTube channel (in French) do not hesitate to subscribe it will help me a lot 🤗.

datanews

Christophe Blefari

Data Engineering Coach that enjoys all kind of data platform.