Skip to content

Data News — Week 25

Data News #25 — Neo4j and Firebolt fundraisings, Analytics Engineer hype, understand async in Python, smart cities IA over-promises

Christophe Blefari
Christophe Blefari
3 min read
An Analytics Engineer discovering the Data Warehouse (credits)

New Friday, new digest. Last week before July (don't worry I'll continue the curation work even in July and August). I hope you'll all enjoy well deserve holidays in the next two months. This week we got 2 big fundraising (Neo4j and Firebolt) and super nice articles (a lot around analytics engineering).

Data fundraising 💰

  • I missed it for the last week newsletter, but here you have. Neo4j graph database raised $325m in a Series F led by Eurazeo. The graph database is now valued at more than $2B. I think that graph databases will be more and more used in data platforms mainly because of data catalogs and rise of metadata lakes.
  • Firebolt, the new Cloud Data Warehouse on the block, announced $127m in Series B. The Israeli startup with 100 employees promises the world's fastest and efficient DW. With the fundraising they will expand in many countries seeking for new clients. I can't wait to see companies feedback on it.

What's the hype about Analytics Engineer

2021 will be the year of the Analytics Engineer. All companies are looking for their own in order to create a well designed and owned data warehouse. But why is it so hype? Oliver Molander tries to answer this difficult question.

Probably the rise of Analytics Engineer comes from the shift from ETL to ELT and with the rapid development of dbt. To go further I propose you this Reddit post that tries also to discuss the hype around the ELT paradigm.

To finish on the Analytics Engineering topic, did we lost the Art of data modeling? This article reminds all of us that data modeling is still a technique that we need to master it as data engineers (and analytics engineers too).

Change data capture ✨

We have a lot of articles about of batch data processing and sometimes when I see articles about Change Data Capture I'm happy. But this week we have 2! The first one is explaining what's CDC through some examples and mentioning Debezium. The second one is more about 8 practical use cases where CDC could be useful. So if you want to move to a move event-driven platform these articles are worth checking.

Building a decentralized platform

How a company can build a self-service data platform and decentralize data ownership? Barr Moses, CEO of Monte Carlo, writes about it. Obviously the data observability is key in this kind of platform because to have independence you need transparency and observability. Also, Data Mesh is not mentioned in the article!

To go further in order to build a self-service platform you will need to listen the data consumer voice. That means you'll need to consider data as a product. If data is a product then start hiring a Data Product Manager. My friend Pragun wrote a nice piece of paper around applying product driven methods to data.

Is Data Science like a fake YouTube entrepreneur Coach? 📺

Florian Grüning writes feedback about the huge gap between data science promises and the reality. Especially when we speak about "smart cities" (in his case Leipzig). Public services are still working with CSV and PDF. No-one speaks about data quality for instance.

🚸 Junior data engineer seeking for projects

I often get this question: "How as a junior data engineer can show my skills and get a job?". I think that the answer is simple: store your golf data and write a medium post. Jokes aside, my advice is to find a topic close to your passion and imagine a project with it. You'll be happy to work on it and you'll have endless ideas.

Demystifying Asynchronous Programming in Python

If you are always stuck in a loop without understanding what are coroutines or async/await keywords. The following post is for you. The post tries to explains starting with simple concepts and code what's behind asynchronous code. It's well written and worth checking for all Python developers (others languages are also authorized :)).

async/await explained (credits) — I know this is more a queue system representation but this is the only image I found on Unsplash

ML Friday, train a GAN

If you don't know what is a generative adversarial network I invite you to go on the geekculture blog post. Through the neural network the author generate a smooth video with Italian landscapes. Maybe in the future GAN will generate tiles for the famous Carcasonne board game.

datanews

Christophe Blefari

Data Engineering Coach that enjoys all kind of data platform.