dbt Inside — Setting the data platform standard

Olivier Dupuis
4 min readOct 20, 2022

--

I’m at New Orleans airport, having spent the last week attending dbt Coalesce, a first outing in a few years. I’ve been a bit overwhelmed by the social interactions at first to be honest, but as the week rolled by, meeting new people and having discussions about our data practice and where it’s evolving has been quite refreshing.

I came in mildly curious about the new python and semantic layer offerings. I guess I just didn’t grasp what was the purpose of it all. I can already use Dagster to merge in python-based transformations. I can use Cube to define and serve metrics. This is already well served. Why is dbt going in that direction?

As I was attending the Partner session on the first day, I couldn’t help thinking : what is the mission behind all of those new developments? What’s the vision driving dbt Labs towards “version 2”?

I’ve been using dbt since 2018. I saw the journey towards version 1. It made sense. I understood and adopted and evangelized the core principles around “do one thing and do it well”, openness, modularity, no-vendor lock-in, etc.

Now that we’re past version 1, what’s the future about? I’ve seen the release roadmap for core and cloud: python, the semantic layer, multi-projects, multi-lingual, etc.. But what’s the mission here? Can we read anything between the lines of those announcements?

I guess things started to coalesce in my head (you’re welcome) after speaking to new adopters, established users, and attending some of the sessions.

Now let’s be clear, this is just a hot take. I’m no Benn Stancil. I’m most probably not picking up all the signals, or even the right signals. But you know, this is a fast-moving market and I know of quite a few people who were asking themselves the same question during Coalesce.

Increase the market

My reading of the keynote and release sessions is that dbt is positioning itself to increase the size of the dbt-enabled market by making python a class-A citizen of its ecosystem.

I talked with someone from a big organization that is moving towards dbt. Their ETL is currently mostly python-based. dbt now supporting python makes it even easier for them to transition.

Python makes adopting the modern data stack and dbt easier for a sizable portion of the market that hasn’t transitioned yet. Data practitioners use Python, organizations rely on Python. Makes sense to built a bridge to welcome newcomers.

Be indispensable

As data practitioners, we know that data platforms include more than building up a data warehouse. We have to source data, prep it, enrich it, build that data warehouse, train ML models, run notebooks, feed APIs, serve BI queries, etc, etc. The list is endless.

dbt makes part of that platform easier to do. But what if it could increase its reach throughout our data platforms.

Adopting dbt is easy and now a compelling choice for teams that are python-based. So how do you leverage your position at the heart of data platforms to serve every other function of analytics engineering practices? By establishing a standard.

Python is perfect to perform all those other data platform functions that are “pre” data warehouses (sourcing, preparation, enrichment, loading). And for “post” data warehouse functions, the semantic layer sets the standard: it defines the language and mechanism for all data apps to consume data from your data warehouse.

Serving metrics is a first step. But once dbt provides a standard to define entities, relationships, hierarchies, etc, then it becomes the main driver for all data apps that sources from the data warehouse.

And with that you set the standard. You are essentially the operating system of the whole data platform. You define the standard in moving, transformation, exposing, defining and serving data.

dbt Inside

Owning the standard for all data platform functions essentially puts you in a position to be at the heart of that growing market, lead it and capture a sizeable portion of its value.

That puts competing alternatives that might have had an edge in terms of execution in a tough position. They might have had the first-to-market advantage, but they are not predominant enough to leverage their position and set a standard.

To me, that is what’s driving dbt Labs towards “version 2”. It’s about growing the market, becoming indispensable and leading that market.

I guess the question is if this is good for us all, analytics engineering practitioners and clients that are adopting that new standard. Is this becoming a walled garden with all the risks that this entails? Are we moving away from some of the original core principles of “do one thing and do it well”, modularity, replaceability, no vendor lock in, etc?

All that said, I’m really excited to see the market evolve. I can only be grateful to the great team (past and present) of dbt Labs for having shaped that market and continuing to push the envelope. Whatever their strategy is at the end of the day, I’m forever a fan of theirs.

Thanks dbt for a great event. And mostly thanks to all attendees. The conversations were insightful and I’m going back home energized for another year of data platform development.

Coalesce 2022

--

--

No responses yet