Skip to content

Data News — Week 47

Data News #47 — The data links explorer (only for members), Stuart data platform as a product, Agile DE, sed command for noobs, and more.

Christophe Blefari
Christophe Blefari
4 min read
Oups I'm late, but with a surprise (credits)

Hi Data News readers, this week edition is one hour late and smaller than usual but still interesting.

This week I got a Black Friday offer for you just below 🤓.

We developed over the last 2 weeks a dedicated page on that will allow you to search over all the articles that have been shared since the start last May (more than 600). In the following weeks this page will only be accessible to newsletter subscribers as a reward for your support. Do not hesitate to reach me on LinkedIn or by email to give me feedback on the page.

You'll probably need to login to have access to the page, if the workflow is to heavy tell me I'll try to find a workaround. And Subscribe to get instant access to the links page.

Small glimpse of the data links explorer v0.1 (page)

How Stuart is building their data platform as a product

This is a vocabulary question do we treat data as a product, do we do data products or do we build data platform as a product. At Stuart they have chosen the last one. Osian, a Data Product Manager, details his 3 first months where he treated the data platform as a product: user personas, user intents, product components. Everything goes in.

I personally think this is a good approach in order to bring closer engineering practices to data world. This post has really good illustration that are self-portative and good to give vision and understanding over your data stack.

On the other hand I also want to give another perspective over the data product trend. Coming from the data mesh and more precisely from DDD principles (Domain Driven Development). If you want to treat your platform like an aggregation of data products that answers business needs you'll need a methodology to discover them. This post will help you in the identification of data products. This post may be hard to read but it has some interesting concepts.

Agile Data Engineering at Miro

To continue in the Agile world, Miro shared their 4 pillars (or values) when it comes to do agile data engineering. I do agree with this light post that in data engineering it's often better to embrace an iterative delivery pace rather than a waterfall one.

Also do not forget to always challenge and deeply understand the needs before starting a project. I would say it'll save you a lot of time. Engineers tend to love to build complex system even if it's not mandatory — I've been there too, mea culpa.

Better do data engineering with glasses (credits) — once again a joke for French people

Data-Centric AI — The rise again of the Data Engineer

Walmart Tech wrote this post following the shift in the industry from Model-Centric AI to Data-Centric AI. Probably the data engineers and data engineering (again) are key pieces of the puzzle. This is a meta post that tries to argue around the border of ownership between DataOps and MLOps. I'd conclude that data jobs definitions in the end are unique per companies with some invariant like data infrastructure skills always needed.

Data Governance has a serious branding problem

I do agree. In a lot of (biiig) companies the Data Governance problem has been treated like a risk and security topic rather than a collaborative one. Legal team needs to understand where the data flows in order to answer legal needs and to govern better (yep I write it).

Prukalpa, Atlan co-founder, wrote about this topic a nice post following the history of data governance but also shows we can be achieved if you modernize your governance tooling. I find it interesting even if the post is biased as Atlan operates in that field.

What is BI Engineering?

I wrote a lot about Analytics Engineering in the newsletter because more and more companies are switching to this vocabulary, but we historically came from BI and from BI Engineering. Folks at Grofers what is modern BI Engineering and what are the roles involved.

Landing data on S3: the good, the bad, and the ugly

❤️ My favourite article of the week, this is a technical post about what is at stake when landing data on S3 and also a small journey about what you could encounter when doing so. Nevertheless, I do not agree with the takeaway saying that Spark is the best solution to use because the answer is probably: "it depends on your architecture".

Unix sed command tutorial with examples

Who is using sed among the data news readers? If you do more than one sed command per month hit reply. I'LL 🤗 YOU FOREVER.

A part from the joke, because data is a field quite new for a lot of people I'd say that a sed tutorial is a good read for everyone because sed could help you gain time when transforming small datasets locally on your computer or elsewhere. Imagine a world where you can avoid start Excel when you want to removing lines from a CSV. Imagine.

After that you just need to learn multi-cursor operations and you're a true data geek.

"Now, I see the light" — a testimony from someone that discovered sed (credits)

ACID vs BASE: Comparison of two Design Philosophies

You've probably understood now that I like database concepts explanation. This week we got ACID vs. BASE detailed. This is a short post, but it's well written.

Thank you for reading the data news until here. This week you did not have fundraising or fast news because I did not find any relevant content to add inside. Please try the link explorer tool and give me feedback on it.

See you next week. In December already.


Christophe Blefari

Data Engineering Coach that enjoys all kind of data platform.