Skip to content

Data News — Week 39

Data News #39 — Speedata, Amplitude and Anaconda money rounds, MAD landscape, Airbyte worth the hype, data people skill set and usual fast news.

Christophe Blefari
Christophe Blefari
5 min read
Me writing and delivering the Data News directly in your home station (credits)

Hi there. Big news this week. The post will be a bit longer than usual as I added some more views about the topics I care the most. If you find views interesting do not hesitate to reach me on LinkedIn to say give me your feedback. I'm looking forward to hear you.

Have fun with the news 👇

Data fundraising 💰

Data landscape has gone MAD 😅

Will all VC and market money the data ecosystem and market has probably gone mad: startups and tools are going out the blue every week, salaries are increasing, data is flying and privacy concerns are... what is privacy? Is it a bubble? I dont know.

If you are lost in that lake or if you want to understand a bit what is happening the huge 2021 machine learning, AI and data (MAD) landscape map is out. Shout-out to Matt Turck for this quality work. The write-up is long and decrypt all the concepts for you.

On the other hand thoughtworks team released the Tech Radar 24, when it comes to data ecosystem, they "trialed" Snowflake, dbt, Great Expectations, Delta Lake, Materialized, MLflow and Streamlit and starting to consider DataHub, Dagster and Feature Store concept.

A MAD landscape (credits)

Data Visualization Society survey

The Data visualization society is running a State of the industry survey that is closing down today. Go fill out the survey there, you can also find the result from 2020.

Airbyte — Worth the hype?

I live in a bubble where I see Airbyte a lot, I mean a lot — LinkedIn and relations working there — I haven't had the time yet to test it out, this post is trying to test the tool with a lot of screen captures and thinking about use-cases.

gRPC for Data Engineers ⚙️

I really like articles that directly in the title are saying by whom it is meant be read 🙂. This one is trying to explain simple concepts about gRPC and how to use it when you already master Python.

For those that did not know gRPC is a protocol implemented by Google 6 years ago that aims to be used in API communication, by default gRPC uses protobuf messages (Google again) over HTTP.

Data quality metrics for your data warehouse

Metaplane team comes up with a write-up about data quality metrics you should look into when you want to build a working warehouse. Or as they say in the title KPIs for KPIs. This is a must-read if you are still struggling in data quality definition in your data team.

Data people skillset — Analytics Engineer and others

These last years 2 new positions came out to fill the void in data teams. Even if it seems that the Analytics Engineer is a rebrand of the SQL developer or the BI Engineer — with new skills, tools and profiles tbh. The ML Engineer is here to help DS avoiding becoming unicorns.

A Reddit user analyzed 44k unique job posts and tried to defined what are the technologies used per position. If you are trying to hire data people this raw post can help you finding the right words. AS the author said this is a US-centric view.

My takeaways from this are:

  • Python stronger than ever across the universe — Java sometimes here, Scala, wait Scala?
  • Old tools like Hadoop, SSIS are still here
  • dbt — that democratize the Analytics Engineer position — is doing a small 20% appearance in the AE job posting, what does that mean for the other AE positions?
  • Tableau still being the reference. That makes me asking a question I have for a long time, why Looker is the visualization layer in the Modern Data Stack?
Analytics Engineering — old wine in new bottle, it's all marketing gimmick (from Reddit comment - photo)

Why data scientists shouldn’t need to know Kubernetes

Following the previous post about technology skills for data scientists. We can notice that Kubernetes isn't mentioned (and this is a good point IMHO). But still if you did not read this great post by Chip Huyen it's a good reminder and worth checking because the industry is shifting away from the Unicorn data scientist.

s/Kafka/Pulsar/g 🪛

Geeky title for all sed editors out there. If you have the motivation to move away from Kafka to use Apache Pulsar instead Jesse Anderson wrote about how Pulsar could have help companies like Slack and Uber struggling with some Kafka internals.

Fast News ⚡️

I've already reach the usual 800 words for the newsletter so I'm keeping it short for the last articles.

Be safe and read the two last articles to take care about your mental health.

Have a good weekend (credits)
datanews

Christophe Blefari

Data Engineering Coach that enjoys all kind of data platform.