Full-Text Search Implementation — PostgreSQL, MySQL, and Beyond
Why LIKE queries break down#
SELECT * FROM articles WHERE body LIKE '%kubernetes deployment%';
This scans every row. No index helps. At 100K rows it is slow. At 10M rows it is unusable. It misses "deploy" when searching "deployment." It cannot rank results by relevance.
Full-text search solves all of this.
PostgreSQL full-text search#
PostgreSQL has built-in full-text search that is surprisingly powerful.
tsvector and tsquery#
A tsvector is a sorted list of lexemes (normalized words). A tsquery is a search expression.
-- Convert text to tsvector
SELECT to_tsvector('english', 'The quick brown foxes jumped over the lazy dogs');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2
-- Search with tsquery
SELECT to_tsquery('english', 'fox & jump');
-- Result: 'fox' & 'jump'
Notice: "foxes" becomes "fox", "jumped" becomes "jump", "the" is removed. This is stemming and stop-word removal.
Adding a search column#
-- Add a generated tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED;
-- Create a GIN index
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);
Weighting with setweight means title matches rank higher than body matches.
Searching#
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'kubernetes & deployment') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
This uses the GIN index, returns ranked results, and handles stemming automatically.
Phrase search#
-- Find "container orchestration" as a phrase
SELECT * FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'container orchestration');
Prefix search (autocomplete)#
-- Match words starting with "kube"
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'kube:*');
MySQL FULLTEXT search#
MySQL supports FULLTEXT indexes on CHAR, VARCHAR, and TEXT columns (InnoDB and MyISAM).
Creating a FULLTEXT index#
ALTER TABLE articles ADD FULLTEXT INDEX ft_articles (title, body);
Natural language mode#
SELECT title, MATCH(title, body) AGAINST('kubernetes deployment') AS score
FROM articles
WHERE MATCH(title, body) AGAINST('kubernetes deployment')
ORDER BY score DESC;
MySQL ranks by TF-IDF relevance automatically.
Boolean mode#
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('+kubernetes -docker +deployment' IN BOOLEAN MODE);
+means must include-means must exclude- No prefix means optional (boosts rank)
MySQL limitations#
- No stemming by default (searching "deploy" won't match "deployment")
- Minimum word length is 3 characters (configurable via
ft_min_word_len) - Stop word list is fixed unless you customize it
- No phrase proximity search
- No field weighting (title vs body)
Trigram indexes for fuzzy matching#
When users misspell words, stemming does not help. Trigram indexes do.
PostgreSQL pg_trgm#
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_articles_title_trgm ON articles
USING GIN(title gin_trgm_ops);
-- Fuzzy search
SELECT title, similarity(title, 'kuberntes') AS sim
FROM articles
WHERE title % 'kuberntes'
ORDER BY sim DESC;
The % operator finds similar strings. "kuberntes" matches "kubernetes" because they share enough trigrams.
Combining tsvector + trigram#
Use tsvector for precise full-text search. Use trigram as a fallback for "did you mean?" suggestions.
-- Primary search
SELECT * FROM articles WHERE search_vector @@ to_tsquery('english', 'kuberntes');
-- No results? Fall back to trigram
SELECT title FROM articles WHERE title % 'kuberntes' ORDER BY similarity(title, 'kuberntes') DESC LIMIT 5;
Search result highlighting#
Show users where their query matched.
PostgreSQL ts_headline#
SELECT title,
ts_headline('english', body, to_tsquery('english', 'kubernetes'),
'StartSel=<mark>, StopSel=</mark>, MaxFragments=3, MaxWords=30')
AS snippet
FROM articles
WHERE search_vector @@ to_tsquery('english', 'kubernetes');
This returns fragments of the body with matches wrapped in mark tags.
MySQL snippet equivalent#
MySQL does not have built-in highlighting. You must implement it in application code or use a search engine.
Building autocomplete#
Autocomplete needs sub-50ms response times.
Approach 1 — Prefix search with tsvector#
SELECT DISTINCT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'kube:*')
LIMIT 10;
Fast with a GIN index, but returns full documents, not suggestions.
Approach 2 — Separate suggestions table#
CREATE TABLE search_suggestions (
term TEXT PRIMARY KEY,
frequency INT DEFAULT 1
);
CREATE INDEX idx_suggestions_trgm ON search_suggestions USING GIN(term gin_trgm_ops);
-- Query as user types
SELECT term FROM search_suggestions
WHERE term LIKE 'kube%'
ORDER BY frequency DESC
LIMIT 10;
Populate this table from search logs, article titles, and tags.
Approach 3 — Materialized view#
CREATE MATERIALIZED VIEW search_terms AS
SELECT DISTINCT unnest(string_to_array(title, ' ')) AS term
FROM articles;
REFRESH MATERIALIZED VIEW CONCURRENTLY search_terms;
When to use Elasticsearch instead#
PostgreSQL full-text search is excellent up to a point. Consider Elasticsearch or OpenSearch when:
| Factor | PostgreSQL FTS | Elasticsearch |
|---|---|---|
| Dataset size | Up to ~10M documents | Billions of documents |
| Query complexity | Basic ranking, phrase, prefix | Fuzzy, synonym, multi-language, geo |
| Relevance tuning | Limited (weights A-D) | Highly customizable scoring |
| Faceted search | Manual with GROUP BY | Built-in aggregations |
| Real-time indexing | Immediate (same transaction) | Near real-time (~1s refresh) |
| Operational cost | Zero (it is your database) | Separate cluster to manage |
| Consistency | Strongly consistent | Eventually consistent |
Rule of thumb: if you are already on PostgreSQL and have fewer than 5M searchable documents, start with built-in FTS. Add Elasticsearch when you outgrow it.
Performance tips#
- Always use GIN indexes — GiST is slower for text search
- Store tsvector as a generated column — avoid computing it at query time
- Limit result sets — use
LIMITeven with indexes - Use
websearch_to_tsqueryfor user input — it handles quoted phrases and operators safely - Vacuum regularly — GIN indexes need maintenance
- Partition large tables — search within partitions for faster results
Visualize your search architecture#
Map out your search pipeline from ingestion to query — try Codelit to generate an interactive architecture diagram.
Key takeaways#
- PostgreSQL tsvector/tsquery is production-ready full-text search with stemming, ranking, and indexing
- MySQL FULLTEXT works for basic search but lacks stemming and field weighting
- Trigram indexes handle typos and fuzzy matching — combine with tsvector for the best experience
- Highlight results with
ts_headlineto show users where matches occurred - Autocomplete needs a dedicated approach — prefix search, suggestions table, or materialized views
- Elasticsearch is worth the operational cost when you outgrow database-native search
- Start simple — built-in database search covers more use cases than most teams realize
Article #405 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.
Try it on Codelit
GitHub Integration
Paste any repo URL to generate an interactive architecture diagram from real code
Related articles
API Backward Compatibility: Ship Changes Without Breaking Consumers
6 min read
api designBatch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency
8 min read
system designCircuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j
7 min read
Try these templates
Search Engine Architecture
Web-scale search with crawling, indexing, ranking, and sub-second query serving.
8 componentsMultiplayer Game Backend
Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.
8 componentsGoogle Search Engine Architecture
Web-scale search with crawling, indexing, PageRank, query processing, ads, and knowledge graph.
10 components
Comments