Data Council 2023
A selection of 10 talks I really enjoyed among the Data Council forward thinking presentations.
Data Council Austin is a yearly conference that features a great panel of speakers giving talks about the future of the data field. As I often do I've overlooked the 70 presentations and here a medley of what I've liked.
My personal selection
If you had only 3 videos to watch it should be the 3 following:
- Malloy an experimental language — This is my favourite talk. Llyod, founder of Looker, puts 30 years of data warehousing into perspective in 30 minutes, especially the fact that we see "data in rectangles." Since joining Google, he's been working on Malloy, a new way to query data. Malloy compiles in SQL and works on data semantics. The presentation gives another look at the semantic layer. During the demo, Llyod does some data analysis in the browser and it's just mind-blowing 🤯.
At the same time someone Google also did a Calcite presentation.
- Data contracts, Accountable data quality — Data contracts is a trendy concepts that contains a lot of things. Chad Sanderson did the best recap about it. DE is often constant firefighting, a lot of (spaghetti) SQL to maintain. A lot of breaking changes are coming from upstream producers (form or content).
At scale everything breaks without data quality, the modern data stack is good because self-service and easy to implement but lacks of everything to be mature in the future: ownership, data quality, context. It creates a non-consensual API, we pull data but never agreed on a contract (SLA, schema, etc.).
The root cause is mainly because of miscommunication between producers and consumers. Data contracts aims to fix with API-based agreements between producers and consumers that capture the schema, semantics, distributions and enforcement policies of the data.
You can also watch Whatnot data contracts implementation.
- Metric trees — It reminds my KPIs framework people were doing when I started to work in a consultancy firm. This is nice way to represent your company business. Still today 90% of the value a data team delivers is in the analytics. The analytics goal is to model correctly business. You should answer 4 questions: what happened, why did it happened, what's going to happen, what should we do next.
Organisations are systems with inputs and outputs and a formula. Formulas have metrics, relationships and weights. In the end you can depicts all your KPIs with formulas.
The data team strategy should be mainly to define and operationalise the company growth model. Using a metric tree as a logical representation of a growth model. You have 3 types of outputs: customer value, financial and strategic.
Other stuff I liked
- Snowflake optimisation guide — This is a pragmatic guide on how you can lower your Snowflake costs. In the current context we have to do more with less. The talk starts with a great introduction of Snowflake architecture. In a nutshell the speakers share tips about warehouses sizing and design, performance optimisation with pruning, clustering and query design.
- LLMs and Semantic layer —This is something I've in mind for a few time. This is a tool presentation but still it's relevant. On the same topic of self-service Whatnot shared how they turned data consumers in data constructors.
- Scaling Uber metrics systems (w/ Pinot) — uMetric migration from ES to Pinot. They created an unified layer where metrics uses the same logic for downstream consumers. uMetric manages definition, discovery, computation, verification and serving.
- Writing unit test for data science — Pragmatic guide about unit tests.
- Retro on data science by DJ Patil — DJ Patil has been US Chief Data Scientist. He coined the "data scientist" term back in 2008. He does a great retro.
- Dashboards as code — Using code to make BI dev better, this is DataOps, we have almost X as code in the whole data chain, only dashboards lacks of it.
- Growing the data Team and data Culture at GitLab — GitLab data playbook is well-known. The eng - director gap problem. This is when you have a director that manages an individual contributor.
- A deep-dive into the dbt manifest — How to do a dry-run in cloud data warehouse, load the manifest as dynamic dags, enforce polices or build monitoring.
- Augmenting the modern data stack — by merging batch and real-time technologies in one database.
See you soon ❤️.
Join the newsletter to receive the latest updates in your inbox.