Hey, this is a straightforward post about the ides and the takeaways I got for the DuckCon. I guess the recording will be posted online a in few days / weeks.
It took place in Amsterdam in a wonderful location. The agenda of the afternoon was quite small (because it is still a small conference) but interesting. There is something awesome to meet the DuckDB community at this step. The tool has not yet reached his peak so you meet people that are early adopters and fans of it — it's a nerds (male — diversity might come later I hope) conference actually.
The Duck creators announced that v0.10.1 is coming soon and before end of July we might get the v1.0.0. DuckDB adoption numbers are demonstrating a real trend behind the "hype". DuckDB docs website gets 500k unique visitors per month and DuckDB has a new shiny website.
Soon we will get things like:
- Forward (best efforts) and backward (guaranteed) compatibility between duckdb file formats
- Attach Postgres database to execute Postgres queries from DuckDB prompt
- Fixed lengths arrays new data type
- A new unified memory manager
- A secret manager that can persists between sessions
- A new compression algorithm called ALP that brings faster compression / decompression and higher compression ratio
- v1.0.0 will have no new feature compared to v0.10, focus on stability stuff and bugfixes
Ideas from talks
I'll just throw in the wild ideas and stuff I've seen from talks.
- HuggingFace is using DuckDB in multiples features to power data exploration in the frontend. In their datasets product when looking at a dataset you can full text search or see distributions (with bars at the top of columns) and this is powered with DuckDB. Lastly they pre-compute statistics on datasets with DuckDB.
- Fivetran uses DuckDB as the tech to do file merge in the data lake offering
- Datacamp uses DuckDB to be able in notebooks to query dataframes in SQL and consider it for teaching SQL — I might have something in the making about this on my side.
- dbt Core developer is using DuckDB is pdb to debug what happening in the database pretty easily and can create "debug packages" to send to other people.
- DuckDB feels magical for a few people (Liverpool FC) because it does stuff faster than other technologies with less technical footprint — you just write SQL and it works.
- The pattern might me
- Get the data out of db
- Query it with DuckDB
- Put it the data back into the db
There is something between the lines, even if DuckDB is used differently by everyone it just runs and creates something universal (thanks to SQL). Actually this might be the final tool that will break the wall between tech teams and data teams.
With DuckDB you offload a business logic that would be embarked in a backend app into SQL queries. You can use DuckDB as a library and not a service, which changes everything, what you need to do is
import duckdb and not launch a Docker service manage connection strings, etc.
Last point, parquet was the starting point of a lot of use-cases because the Duck is working well with the columnar files. But between all the question and feeling people seems to like the idea of a DuckDB file format that will become the defacto data format.
I'm sorry I've written this as enhanced raw notes, I hope you'll like it.
Join the newsletter to receive the latest updates in your inbox.