This article is meant to be a resource hub in order to understand dbt basics and to help get started your dbt journey.

When I write dbt, I often mean dbt Core. dbt Core is an open-source framework that helps you organise data warehouse SQL transformation. dbt Core has been developed by dbt Labs, which was previously named Fishtown Analytics. The company has been founded in May 2016. dbt Labs also develop dbt Cloud which is a cloud product that hosts and runs dbt Core projects.

In this resource hub I'll mainly focus on dbt Core—i.e. dbt.

First let's understand why dbt exists. dbt was born out of the analysis that more and more companies were switching from on-premise Hadoop data infrastructure to cloud data warehouses. This switch has been lead by modern data stack vision. In terms of paradigms before 2012 we were doing ETL because storage was expensive, so it became a requirement to transform data before the data storage—mainly a data warehouse, to have the most optimised data for querying.

With the public clouds—e.g. AWS, GCP, Azure—the storage price dropped and we became data insatiable, we were in need of all the company data, in one place, in order to join and compare everything. Enter the ELT. In the ELT, the load is done before the transform part without any alteration of the data leaving the raw data ready to be transformed in the data warehouse.

dbt purpose as conceptualised in 2017—which is the same today (What, exactly is dbt?)

In a simple words dbt sits on top of your raw data to organise all your SQL queries that are defining your data assets. And dbt only does the T of the ELT which is really clear in term of responsibilities.

dbt is a development framework that combines modular SQL with software engineering best practices to make data transformation reliable, fast, and fun.

It was the previous tag line dbt Labs had on their website. This is important to understand that dbt is a framework. Like every framework there are multiple hidden pieces to know before becoming proficient with it. Still it very easy to get started.

dbt concepts

There are a few concepts that are super important and we need to define them before going further:

In a nutshell the dbt journey starts with sources definition on which you will define models that will transform these sources to something else you'll need in your downstream usage of the data.

ℹ️
I want to mention that the dbt documentation is one of the best tools documentation out there. So do not hesitate to go there to understand better concepts we needed. You just have to understand that there is the reference part which is the detailed documentation of function or configuration and there is the documentation part which is more about concepts and tutorials.

dbt entities

I don't want to copy paste the dbt documentation here because I think they did it great, there are multiple dbt entities—or objects, I don't know how to name it, they name it resources, but I don't want to clash with the resource as a link. So there are multiple dbt entities you should be aware of before starting any project, the list below is exhaustive (I hope) but more, the list is sorted by priority:

You can read dbt's official definitions.

⚠️
I feel that this is important to mention again that dbt Core is a framework to organise SQL files and not a scheduler that will be able out of the box run your transformation on a fixed schedule.

Also dbt only does a pass-through to your underlying data compute technology, there is not any kind of processing within dbt. Actually dbt can be seen as an orchestrator with no scheduling capabilities.

Analytics engineering

dbt is becoming a popular framework while being extremely usable. A lot of companies have already picked dbt or aim to. There are multiple technological reasons for this, but technology is rarely the real reason. I think the reasons dbt is becoming the go-to are mainly organisational:

dbt Labs also popularised the analytics engineer role. We can quickly summarise the role as in-between the data engineer and the data analyst. But because companies can have very versatile definition of role, I'd say that the analytics engineering is the practice to create a data model that represents accurately the business and that is optimised for a variety of downstream consumers. So the analytics engineers are the one doing this.

By the position of this role and the freshness of it, people are coming into analytics engineering from data analytics. Usually they don't have a lot of software engineering good practices and knowledge, which is obvious, but the dbt framework is also meant to bring this to the table.

This is also fair to say that dbt as a tool is very easy to use and very often the complexity of the dbt usage will lie in the SQL writing rather than the tool usage by himself. There are also a few questions in term of project structuration that needs to be done.

If you like this article you should subscribe to my weekly newsletter to not miss any other article of this kind.

Resources

As I only want to help you get started with concepts I know want to redirect you to other articles that I find relevant to go deeper: