When the Data News lands on Saturday (credits)

In last week newsletter I've also share what is a metrics store, which led to a longer edition than usual and I saw that a few people did not like it this way. It was a try I'll see in the future how I can do it better. Still, what is a metrics store? You can check out the post extracted from the newsletter.

On the same topic this week Pierre shared how to create a semantic layer in Preseti.e. managed Apache Superset—to do so, it first defines metrics within dbt and then thanks to the CI/CD it pushes to Preset the metrics definition. This is a great example of a simple way to push down metrics to visualisation tools.

Is DataOps really a thing?

Last year DataOps has been used in many different ways to describe so many data-related different tasks. When you look deeply at it some companies put behind DataOps word just data stuff. Which is a bit misleading when you read that DataOps is "DevOps for data". Because all things wrapped DevOps is something different than software engineering.

I personally do share this perspective. Data engineering is mainly software engineering applied to data, or at least we try. If we see it this way, this is logical to say that DataOps is the movement to smoother the operation side, which technically means the infrastructure side—the IT as previous generations were saying, I don't like IT, it makes me feel old. Data engineering is also an infrastructure heavy field with a lot of technologies to put together to create something that works. This is why DataOps is important. This is why Infrastructure as Code is mandatory.

To me it stops here, all the marketing derivation of it saying we do data products using DataOps methodology is just marketing. Actually you are just writing code applied to data and using Docker containers to deploy it in the cloud. I think we should stick to software engineering vocabulary.

It also means that the data engineer role is constantly evolving. Especially with the new appearance of the analytics engineer role. Analytics engineers are taking tasks out of data engineers—which is for the better tbh. Data engineers will have to focus more on software and on infrastructure. Shifting the expertises. Analytics engineers will become the data modeling experts. Data engineers will own the infrastructure side and software related to data team—which is already a too broad field with different ownerships (DS, MLE, etc.).

In the end when I deploy data apps I end up doing Dockerfile with CI/CD processes and I look for cloud services to hosts my containers. If this is not DevOps what is it?

I do stuff in prod (credits)

Fast News ⚡️

There is a village called Vim in Indonesia—originally Vim stands for vi iMproved (credits)

Data Economy 💰


See you next week.