Skip to content

Data News — must-read 2022 articles

A collection of data articles that you should read to remember 2022. Best data articles of 2022.

Christophe Blefari
Christophe Blefari
5 min read
kitsch moment, from me to you (credits)

Hey you, this is the last article of the year and it's gonna be about the articles and trends that made 2022 according to me. You'll see articles that I've already share during the year.

💡
You can also read the 2021's must-read that I've done one year and half ago or how to learn data engineering that contains key articles to understand the field.

Once again thank you everyone for your support this year and see you next week for the first Data News of 2023. Sorry for the delay, I had a blank page syndrome today. Now let's jump to my selection.


ANALYTICS ENGINEERING

We have to be honest in 2022 Analytics Engineering shaped up the data field and concentrated a lot of data discussions. Analytics Engineering can be seen as a renaming of the BI Engineering, if we look at it more precisely it mainly comes out of the data roles specialisation. Analytics Engineers is a specialized role between the Data Engineer and the Data Analyst. Madison had a look a job posting to see what are the skills companies really want in Analytics Engineers.

Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions. [...], an analytics engineer spends their time transforming, testing, deploying, and documenting data. Analytics engineers apply software engineering best practices like version control and continuous integration to the analytics code base.1

Analytics Engineering brought back light on data modeling. Preset wrote a gentle introduction to data modeling. In a nutshell data modeling is the techniques we can use to structure the data in data warehouses. Nowadays we have:

  • Dimensional modeling — Introduced in 1996 by Ralph Kimball. We often use the Snowflake Schema or the Star Schema (that is a special case of the previous one, here Snowflake is not the data warehouse technology but more the shape of the table relationships—drawing a snowflake).
  • Entity modeling — Introduced by Bill Inmon. In this methodology you use the 3NF (third normal form) to model your business entities to avoid redundancy. This approach is less flexible than the previous one.
  • OBT—One big table ; I don't really know who introduced OBT except the fact that Fivetran mentioned it in 2020. This is often the easiest approach to start. Everything in one table, denormalised.

As a final note, a Reddit thread discussing is Kimball's Dimensional Modelling dead in 2022?

In order to complete the AE articles list here a few I recommend as the best 2022 analytics engineering articles:


DATA TEAMS

3 piece of content that I feel are relevant and not really trendy. This is more something long term that we have to have in mind:


ENGINEERING

In loose, a few of the best 2022 data engineering articles:


A GLIMPSE INTO THE FUTURE

This year people talked about a lot of things, with no research here what I can remember:

  • Data Mesh — The Mesh has been assimilated and tried by multiples organisations, what we've seen is that it requires a minimal size to be started, we have yet to figure out if the organisational changes are worth it.
  • Data contracts — An interface between the data producers and the data consumers. The interface can take multiple form, we often summarize it as a schema registry. Very useful in a mesh organisation.
  • Semantic Layer / Metric Layer / Headless BI — "Something“2 between the data warehouse and the BI tool that will probably shape trends next year.
  • Unbundling of Airflow — This is year many Airflow alternatives went public, all with their own vision and great promises, in addition the one-dag-to-rule-them-all strategy has been challenged and execution has also been delocated to other system leaving Airflow like an empty shell. But in the end he'll be back.
  • GPT-3 applications — It has the potential to revolutionize industries through automation and augmenting human intelligence, but has also raised concerns about its potential negative impact on employment (this bullet has been generated by ChatGPT).

Now that I've said this, I think that 3 technologies will shape data engineering next year:

  • Wasm — WebAssembly is a portable compilation target in the browser. In human words it means you can run your favourite language code in a Firefox tab. One example is PyScript, that allows us to run Python in HTML. Thanks to Wasm we can use a decentralised power: your stakeholders laptops.
  • DuckDB — A single node in-memory OLAP database. We did not see yet the full potential. What I think about DuckDB.
  • Dagger — A programmable CI/CD engine that you can run everywhere.


  1. What is an analytics engineer? (Claire Carroll)
  2. Semantic Layer is more than just something. To be honest for the moment I take it sarcastically, because I'm not sure this is something really important—at least when I see my own French market.
Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 1200 links

Explore

Christophe Blefari

Senior Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 23.05

Data News #23.05 — machine learning at big tech, Airflow in Azure, think in SQL, dbt and snowflake clones, generative Seinfeld.

Members Public

Data News — Week 23.04

Data News #23.04 — GPT safe place here, dbt, Airflow, Dagster, data modeling and contracts, data creative people a lot of news.