Skip to content

Data News — Week 41

Data News #41 — Google Cloud partners with Tableau, event tracking system at Udemy, NPS for data teams, BigQuery innovations.

Christophe Blefari
Christophe Blefari
5 min read
A true Tableau partnership: Le Louvre and Mona Lisa (credits)

Hello Data News readers, I hope this edition finds you good. Last week edition was fun to write and was longer than usual. This week post is shorter but with awesome articles. Like all weeks. But no crunchy fundraising to eat today.

Google Cloud partners with Tableau

But well, I found interesting news about Google Cloud partnership with Tableau unveiled at the Cloud Next '21 — the Google Cloud annual conference. With this move it means that Google Cloud can visualize data with Data Studio, Looker and Tableau.

If we read between the lines to me it's a growth technique for Google Cloud, being Tableau's favorite partner when it comes to Cloud migration. I can bet that a lot of users are still on-premise with Tableau Server.

On the other hand, Tableau will be able to access Looker semantic model. Ironically I bet no-one care about this one. If I am wrong please reach out to me.

Designing the new event tracking system at Udemy

❤️ My favorite article of the week. Udemy team wrote a long piece of article about their journey migrating from a legacy event tracking system to a fresh new one. What amazes me in the article is how meticulous they were in the selection phase of the project. How everything goes is: requirements, buy vs. build, serialization.

If you are planning to go event driven in the next month this article is a good start when it comes to technical design. Huge shout-out to the Avro customization they made.

Snowflake Streams applied to IoT data

Following the previous article I suggest reading this demonstration about Snowflake Streams applied to IoT data. The idea is to do a real time processing on top of an eventual big table (31 billions rows/year). Thanks to Streams you are able to get only new rows and compute faster than a full.

Create a STREAM in APPEND_ONLY (only new rows) — image extracted from the article

NPS for data teams

When you are in a data you obviously like numbers but you — also obviously — struggle to know if you truly have an impact. Shifting away from a support team to a product team could be mandatory. To go further we can apply NPS survey to data team to get a team KPI to follow.

This is something I've already done in the past but you need to have a certain scale to be sure that you have relevant results and also a certain maturity. But don't take the — good — results for granted because a recommendable data team for business could not be the data team you want to be in.

In order to boost my own KPIs I recommend you to Subscribe to get the news by email each Friday. Obviously no spam and forever free.

Using Singer to ingest data at Glassdoor

Singer is an open-source standard for composable extract-load. It creates a transfer between a tap — a source — and a target. Because Singer is composable it is theoretically possible to use all taps with all targets, bringing a lot of combination.

That being said. Glassdoor explained how they used Singer to ingest data from APIs. I think this is a good introduction post with good ideas. I really like the idea of using Singer schema discovery features to check if Tap schema have been altered.

A Singer tap (credits)

Thoughts about Hex and dbt

Claire Carroll — previously at dbt Labs — wrote her thoughts about Hex and dbt used together. She says that using a notebook based query tool is better than the Snowflake UI mainly because you are able to juggle between queries results. Finally in the wishing list something everyone is probably waiting for: can we have query editors supporting the dbt ref macro?

As cool as the new table formats

Recently Hudi and Iceberg became topics I write about in the newsletter because it could become the next big improvement in our data stacks. This week we have a series of 3 articles about what is Hudi and how it can be used. On the other side we have a demo article about Iceberg.

What's new with BigQuery

Following the Cloud Next '21, BigQuery team announced what is coming next to BigQuery. With an overlook I can say that they keep bridging the gap in terms of database features with Snowflake by keeping on to adding machine learning expertise. Here a small outlook:

  • BigQuery become heavily interoperable → Storage accessible from various part and query federation even more
  • BigQuery Omni is becoming generally available (GA) to support multi-cloud based workflows, but no idea about the price (Google if you read this contact me)
  • They preview GRANT / REVOKE commands to support data authorization — row and column level security
  • Run Python external functions (and 6 other languages) from your SQL
  • New Monitoring UI to understand how BigQuery is used
  • And more: Table snapshots and clones (hey Snowflake), cheaper write API for streaming, search indexes for text fields, better ML explainability and Vertex AI integration
Recent BigQuery Innovations — capture form the video

Fast News

It's time now for the fast news. Cools news but faster than before.

Have a great weekend and see you next week!


PS: if you read the newsletter until here I thank you, what do you think of a audio format 🎙️ (podcast) of the newsletter? Will you listen it? Drop me an email to tell me, I'm curious about it.

datanews

Christophe Blefari

Data Engineering Coach that enjoys all kind of data platform.