pg-smart-search
Core Concepts

Architecture & Parallel Fast-Track

Deep dive into how pg-smart-search executes queries concurrently, cancels zombie queries, and returns results in sub-15ms.

Architecture & Parallel Fast-Track

The core philosophy of pg-smart-search is simple: dont wait for slow strategies if a fast one succeeds.

Unlike standard search implementations that waterfall through different algorithms (FTS → Trigram → ILIKE), pg-smart-search executes them in parallel.

How it Works

  1. Concurrent Execution: When a search query is received, the engine triggers Full-Text Search (FTS) and Trigram searches simultaneously.
  2. First Past the Post: As soon as the FTS (which uses indices and is typically faster) returns valid results, the engine prepares the response.
  3. Zombie Query Prevention: The engine immediately sends an AbortSignal to cancel the still-running Trigram query in PostgreSQL, calling pg_cancel_backend. This frees up database connections and CPU cycles.

Without Zombie Query Prevention, slow trigram searches would continue consuming DB resources even after the client received the response. AbortSignal ensures clean termination. To prevent database connection starvation (Connection Starvation) under heavy concurrent lookups, the engine manages cancellation signals through an isolated pool called cancelPool (size 2-3) reserved exclusively for abort interrupts.

BM25-Style Ranking

Relevance is calculated using ts_rank, a weighting algorithm built into PostgreSQL. It considers:

  • Term frequency (how often the word appears)
  • Document length (penalizes overly long documents)

Using ts_rank instead of positional proximity calculations like ts_rank_cd improves cold latency queries by 20-30% on massive datasets (10M+ rows) with near-identical relevancy quality.

GiST Indexing & KNN Sorting

For trigram fuzzy similarity matching, the database leverages the GiST index structure rather than legacy GIN indexing. GiST directories natively support the distance operator <->.

This enables KNN (K-Nearest Neighbors) sorting directly inside the database index paths. PostgreSQL can select the most relevant rows instantly without reading or sorting the entire index table in memory, matching a strict $O(\log N)$ scaling logarithmic curve.