Pebbling Club 🐧🪨

  • VectorVFS: Your Filesystem as a Vector Database
    Notes
    VectorVFS is a lightweight Python package that transforms your Linux filesystem into a vector database by leveraging the native VFS (Virtual File System) extended attributes. Rather than maintaining a separate index or external database, VectorVFS stores vector embeddings directly alongside each file—turning your existing directory structure into an efficient and semantically searchable embedding store.
    Unfurl
  • Don't use cosine similarity carelessly
    Notes
    Just as Midas discovered that turning everything to gold wasn't always helpful, we'll see that blindly applying cosine similarity to vectors can lead us astray. While embeddings do capture similarities, they often reflect the wrong kind - matching questions to questions rather than questions to answers, or getting distracted by superficial patterns like writing style and typos rather than meaning. This post shows you how to be more intentional about similarity and get better results.
  • Vector Databases Are the Wrong Abstraction
    Notes
    A more effective abstraction is conceptualizing vector embeddings not as independent tables or data types but as a specialized index on the embedded data. This is not to say that vector embeddings are literally indexes in the traditional sense, like those in PostgreSQL or MySQL, which retrieve entire data rows from indexed tables. Instead, vector embeddings function as an indexing mechanism that retrieves the most relevant parts of the data based on its embeddings.
    Feed
    Unfurl
  • Embeddings are underrated
    Notes
    Embeddings aren't exactly new, but they have become much more widely accessible in the last couple years. What embeddings offer to technical writers is the ability to discover connections between texts at previously impossible scales.
    Feed
    Unfurl