Skip to content

Data News — Week 23.37

Data News #23.37 — A lot of article this week, Falcon 180B, HuggingFac(ing) the senate, Snowflake and BigQuery tips, Databricks still burning cash and raising, etc.

Christophe Blefari
Christophe Blefari
4 min read
man walking near tall trees
Facing the News (credits)

Hello Data News readers. I'm still struggling to get back into my usual work rhythm. If you add the fact that last week I came up with fewer articles than I expected, this has led me to another blank page. Anyway, after 2 years of work, I have to accept and let go when necessary. But don't worry I don't forget you.

Let's quickly jump to the news, because it's rather busy.

(Gen) AI News 🤖

  • Reinforcement Learning: an easy introduction to value iteration — Title says easy, but the article contains maths formula. RL is always something magical and this article explains it well through golf concepts.
  • Falcon 180B has been released on HF — This is interesting to note that Falcon has been developed at Technology Innovation Institute (TII) in Abu Dhabi. It brings diversity to Foundation models usually coming from US. But despite of the number of parameters (180B) can it run on your computer? Spoiler, according to Benjamin it needs 100GB of RAM to run and a good GPUs to be able to fine tune.
  • If you're late to the party and you need fresh views on LLMs Daniel wrote an introduction demystifying the Large Language Models and Jesse wrote about LLMs impact from a Data Engineering perspective.
  • At the same time Github Research quantified GitHub Copilot’s impact on developer productivity and happiness — Developer productivity is a difficult measure to compute. Also productivity ≠ speed, but speed is important. The research also shown that people using Github Copilot feel more 88% more productive and are more efficient and less frustrated.
  • HuggingFace CEO and co-founder opening statement at AI insight forum — This week US AI giants went to a 6-hours private meeting with 60 US senators to explore AI regulation. Clement Delangue transparently shared his speech on Twitter. Mainly he treats about openness, risks measurements—like mis-information, elections manipulation or carbon emission increase—and finally safeguards implementation.
  • Meta developed an end-to-end AI system performance simulator called Arcadia. From what I understand this performance simulator unlock capabilities in finding what are the best parameters for training.
💡
Additional big tech stuff to check: real-time ML training at Etsy and last mile data processing with Ray at Pinterest.

Fast News ⚡️

white and purple card on white surface
I can predict a project failure (credits)
Of course, great onboarding isn’t the only thing necessary to build a high performing team, but it’s almost impossible to build one without great onboarding

Github gems 💎

  • nike-inc/brickflow — Nike engineering team released a Python framework to orchestrate jobs in Databricks workflows. Mainly it maps to Airflow concepts to have a declarative interface over Databricks objects like Cluster, Workflows or Notebooks in order to orchestrate them.
  • sourcegraph/codyCody is a free, open-source AI coding assistant that can write and fix code, provide AI-generated autocomplete, and answer your coding questions. Under the hood it uses either Anthropic or OpenAI LLMs to work and requires a free cody.dev account.
  • teej/titanTitan is a Python library to manage data warehouse infrastructure. Titan allows you to create Snowflake Databases, Warehouses, Role and RoleGrant in a programmatic manner.

Data Economy 💰

rectangular red Supreme container
Databricks atm (credits)
  • SQream raises $45m Series C. SQream is a GPU-based SQL database that can act as a data warehouse promising performance peaks at PB scale because of the GPU architecture. It also works well for machine learning use-cases.
  • Gable raises $7m in seed funding. Chad Sanderson launched his data contracts product / platform in association with 2 other co-founders. Chad produced a lot of content around contracts in the last 2 years. It seems Gable is here to fix upstream data quality with contracts. Alerts will be sent in Github to alert owners when something breaks enforced rules.
  • Databricks raises, another, $500m in Series I. Soon there will be no letter in the alphabet to associate with Databricks fundraising. Since the beginning they raised $4b and are today valued at $43b. Nothing to say except than they love to burn cash. Be ready for a downhill in 2025 if you have picked Databricks.
  • Treefera raises $2.2m in pre-seed to develop a data platform that monitors forests built for carbon offsetting and reforestation. I really like their "data products" approach and the geo visuals over forests risks.
  • Collibra acquires SQL data notebook Husprey. Husprey is a Notion-like directly in the warehouse to write stories on top of each interesting tables or facts. It will become a nice product in the Collibra data governance ecosystem.

See you next week ❤️.

Data News

Data Explorer

The hub to explore Data News links

Search and bookmark more than 2500 links

Explore

Christophe Blefari

Staff Data Engineer. I like 🚲, 🪴 and 🎮. I can do everything with data, just ask.

Comments


Related Posts

Members Public

Data News — Week 24.40

Data News #24.40 — Back in Paris, Forward Data Conference program is out, OpenAI and Meta new stuff, DuckCon and a lot of things.

Members Public

Data News — Week 24.37

Data News #24.37 — OpenAI o1 new series, building low cost platform with Model dlt and dbt, Data teams survey, feature store, Ibis without pandas.