/head.png

Vector Databases: A Traditional Database Developer's Perspective

Vector Databases: A Traditional Database Developer’s Perspective As a traditional database developer with machine learning platform experience from my time at Shopee, I’ve recently been exploring vector databases, particularly Pinecone. Rather than providing a comprehensive technical evaluation, I want to share my thoughts on why vector databases are gaining significant attention and substantial valuations in the funding market. Demystifying Vector Databases At its core, a vector database primarily solves similarity search problems.

ClickHouse on Pandas DataFrame

ClickHouse on DataFrame To be the Fastest SQL Engine on Any Format The story begins with the undeniable fact that ClickHouse is the fastest open-source OLAP engine on the planet. Even when your data outgrows your memory capacity, it can still process it at lightning speed with incredible memory efficiency. Every challenger in this field tries to prove they are faster and easier to use than ClickHouse. However, the unique nature of databases means that five years might just be a warm-up, and it takes a decade to truly master the craft.

chDB is joining ClickHouse

The Start During the Lunar New Year in February last year, in order to solve the efficiency problem of the machine learning model sample data I was facing at the time, I created chDB. Of course, compared to everything that the creators of ClickHouse have done so far, chDB is just a tiny hack on ClickHouse local. Running Everywhere Despite many imperfections, chDB quickly gained a lot of fans in a way that surprised me.

The birth of chDB

Rocket Engine on a Bicycle Before officially starting the journey of chDB, I think it’s best to give a brief introduction to ClickHouse. In recent years, “vectorized engines” have been particularly popular in the OLAP database community. The main reason is the addition of more and more SIMD instructions in CPUs, which greatly accelerates Aggregation, Sort, and Join operations for large amounts of data in OLAP scenarios. ClickHouse has made very detailed optimizations in multiple areas such as “vectorization”, which can be seen from its optimization on lz4 and memcpy.

How I made Apache Superset a macOS App

I’m a heavy user and also a code contributor to Apache Superset. Running Superset on my MacBook is the only reason to have a Docker(still a VM inside?) installed which I think is too heavy. Superset puts most heavy work onto the database side, I was thinking is there may be some possibility to have a Superset.app to make it easier to use Superset on my MacBook. My technical stack is mainly backend, some keywords like: