databasessearchpostgresqlbackend

Full-Text Search Implementation — PostgreSQL, MySQL, and Beyond

March 29, 2026 6 min readBy Codelit Team Discussion

Why LIKE queries break down#

SELECT * FROM articles WHERE body LIKE '%kubernetes deployment%';

This scans every row. No index helps. At 100K rows it is slow. At 10M rows it is unusable. It misses "deploy" when searching "deployment." It cannot rank results by relevance.

Full-text search solves all of this.

PostgreSQL full-text search#

PostgreSQL has built-in full-text search that is surprisingly powerful.

tsvector and tsquery#

A tsvector is a sorted list of lexemes (normalized words). A tsquery is a search expression.

-- Convert text to tsvector
SELECT to_tsvector('english', 'The quick brown foxes jumped over the lazy dogs');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

-- Search with tsquery
SELECT to_tsquery('english', 'fox & jump');
-- Result: 'fox' & 'jump'

Notice: "foxes" becomes "fox", "jumped" becomes "jump", "the" is removed. This is stemming and stop-word removal.

Adding a search column#

-- Add a generated tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
  ) STORED;

-- Create a GIN index
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);

Weighting with setweight means title matches rank higher than body matches.

Searching#

SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'kubernetes & deployment') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

This uses the GIN index, returns ranked results, and handles stemming automatically.

Phrase search#

-- Find "container orchestration" as a phrase
SELECT * FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'container orchestration');

Prefix search (autocomplete)#

-- Match words starting with "kube"
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'kube:*');

MySQL FULLTEXT search#

MySQL supports FULLTEXT indexes on CHAR, VARCHAR, and TEXT columns (InnoDB and MyISAM).

Creating a FULLTEXT index#

ALTER TABLE articles ADD FULLTEXT INDEX ft_articles (title, body);

Natural language mode#

SELECT title, MATCH(title, body) AGAINST('kubernetes deployment') AS score
FROM articles
WHERE MATCH(title, body) AGAINST('kubernetes deployment')
ORDER BY score DESC;

MySQL ranks by TF-IDF relevance automatically.

Boolean mode#

SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('+kubernetes -docker +deployment' IN BOOLEAN MODE);

+ means must include
- means must exclude
No prefix means optional (boosts rank)

MySQL limitations#

No stemming by default (searching "deploy" won't match "deployment")
Minimum word length is 3 characters (configurable via ft_min_word_len)
Stop word list is fixed unless you customize it
No phrase proximity search
No field weighting (title vs body)

Trigram indexes for fuzzy matching#

When users misspell words, stemming does not help. Trigram indexes do.

PostgreSQL pg_trgm#

CREATE EXTENSION pg_trgm;

CREATE INDEX idx_articles_title_trgm ON articles
  USING GIN(title gin_trgm_ops);

-- Fuzzy search
SELECT title, similarity(title, 'kuberntes') AS sim
FROM articles
WHERE title % 'kuberntes'
ORDER BY sim DESC;

The % operator finds similar strings. "kuberntes" matches "kubernetes" because they share enough trigrams.

Combining tsvector + trigram#

Use tsvector for precise full-text search. Use trigram as a fallback for "did you mean?" suggestions.

-- Primary search
SELECT * FROM articles WHERE search_vector @@ to_tsquery('english', 'kuberntes');
-- No results? Fall back to trigram
SELECT title FROM articles WHERE title % 'kuberntes' ORDER BY similarity(title, 'kuberntes') DESC LIMIT 5;

Search result highlighting#

Show users where their query matched.

PostgreSQL ts_headline#

SELECT title,
  ts_headline('english', body, to_tsquery('english', 'kubernetes'),
    'StartSel=<mark>, StopSel=</mark>, MaxFragments=3, MaxWords=30')
  AS snippet
FROM articles
WHERE search_vector @@ to_tsquery('english', 'kubernetes');

This returns fragments of the body with matches wrapped in mark tags.

MySQL snippet equivalent#

MySQL does not have built-in highlighting. You must implement it in application code or use a search engine.

Building autocomplete#

Autocomplete needs sub-50ms response times.

Approach 1 — Prefix search with tsvector#

SELECT DISTINCT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'kube:*')
LIMIT 10;

Fast with a GIN index, but returns full documents, not suggestions.

Approach 2 — Separate suggestions table#

CREATE TABLE search_suggestions (
  term TEXT PRIMARY KEY,
  frequency INT DEFAULT 1
);

CREATE INDEX idx_suggestions_trgm ON search_suggestions USING GIN(term gin_trgm_ops);

-- Query as user types
SELECT term FROM search_suggestions
WHERE term LIKE 'kube%'
ORDER BY frequency DESC
LIMIT 10;

Populate this table from search logs, article titles, and tags.

Approach 3 — Materialized view#

CREATE MATERIALIZED VIEW search_terms AS
SELECT DISTINCT unnest(string_to_array(title, ' ')) AS term
FROM articles;

REFRESH MATERIALIZED VIEW CONCURRENTLY search_terms;

When to use Elasticsearch instead#

PostgreSQL full-text search is excellent up to a point. Consider Elasticsearch or OpenSearch when:

Factor	PostgreSQL FTS	Elasticsearch
Dataset size	Up to ~10M documents	Billions of documents
Query complexity	Basic ranking, phrase, prefix	Fuzzy, synonym, multi-language, geo
Relevance tuning	Limited (weights A-D)	Highly customizable scoring
Faceted search	Manual with GROUP BY	Built-in aggregations
Real-time indexing	Immediate (same transaction)	Near real-time (~1s refresh)
Operational cost	Zero (it is your database)	Separate cluster to manage
Consistency	Strongly consistent	Eventually consistent

Rule of thumb: if you are already on PostgreSQL and have fewer than 5M searchable documents, start with built-in FTS. Add Elasticsearch when you outgrow it.

Performance tips#

Always use GIN indexes — GiST is slower for text search
Store tsvector as a generated column — avoid computing it at query time
Limit result sets — use LIMIT even with indexes
Use websearch_to_tsquery for user input — it handles quoted phrases and operators safely
Vacuum regularly — GIN indexes need maintenance
Partition large tables — search within partitions for faster results

Visualize your search architecture#

Map out your search pipeline from ingestion to query — try Codelit to generate an interactive architecture diagram.

Key takeaways#

PostgreSQL tsvector/tsquery is production-ready full-text search with stemming, ranking, and indexing
MySQL FULLTEXT works for basic search but lacks stemming and field weighting
Trigram indexes handle typos and fuzzy matching — combine with tsvector for the best experience
Highlight results with ts_headline to show users where matches occurred
Autocomplete needs a dedicated approach — prefix search, suggestions table, or materialized views
Elasticsearch is worth the operational cost when you outgrow database-native search
Start simple — built-in database search covers more use cases than most teams realize

{ }

Explore the Discord architecture interactively

Try it →

Article #405 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

API design

API Backward Compatibility: Ship Changes Without Breaking Consumers

6 min read

api design

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

8 min read

system design

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

7 min read

Try these templates

Search Engine Architecture

Web-scale search with crawling, indexing, ranking, and sub-second query serving.

8 components

Multiplayer Game Backend

Real-time multiplayer game server with matchmaking, state sync, leaderboards, and anti-cheat.

8 components

Google Search Engine Architecture

Web-scale search with crawling, indexing, PageRank, query processing, ads, and knowledge graph.

10 components

Build this architecture

Generate an interactive architecture for Full in seconds.

Try it in Codelit →

databasessearchpostgresqlbackend

Full-Text Search Implementation — PostgreSQL, MySQL, and Beyond

March 29, 2026 6 min readBy Codelit Team Discussion

Why LIKE queries break down#

SELECT * FROM articles WHERE body LIKE '%kubernetes deployment%';

This scans every row. No index helps. At 100K rows it is slow. At 10M rows it is unusable. It misses "deploy" when searching "deployment." It cannot rank results by relevance.

Full-text search solves all of this.

PostgreSQL full-text search#

PostgreSQL has built-in full-text search that is surprisingly powerful.

tsvector and tsquery#

A tsvector is a sorted list of lexemes (normalized words). A tsquery is a search expression.

-- Convert text to tsvector
SELECT to_tsvector('english', 'The quick brown foxes jumped over the lazy dogs');
-- Result: 'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

-- Search with tsquery
SELECT to_tsquery('english', 'fox & jump');
-- Result: 'fox' & 'jump'

Notice: "foxes" becomes "fox", "jumped" becomes "jump", "the" is removed. This is stemming and stop-word removal.

Adding a search column#

-- Add a generated tsvector column
ALTER TABLE articles ADD COLUMN search_vector tsvector
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
  ) STORED;

-- Create a GIN index
CREATE INDEX idx_articles_search ON articles USING GIN(search_vector);

Weighting with setweight means title matches rank higher than body matches.

Searching#

SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'kubernetes & deployment') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

This uses the GIN index, returns ranked results, and handles stemming automatically.

Phrase search#

-- Find "container orchestration" as a phrase
SELECT * FROM articles
WHERE search_vector @@ phraseto_tsquery('english', 'container orchestration');

Prefix search (autocomplete)#

-- Match words starting with "kube"
SELECT * FROM articles
WHERE search_vector @@ to_tsquery('english', 'kube:*');

MySQL FULLTEXT search#

MySQL supports FULLTEXT indexes on CHAR, VARCHAR, and TEXT columns (InnoDB and MyISAM).

Creating a FULLTEXT index#

ALTER TABLE articles ADD FULLTEXT INDEX ft_articles (title, body);

Natural language mode#

SELECT title, MATCH(title, body) AGAINST('kubernetes deployment') AS score
FROM articles
WHERE MATCH(title, body) AGAINST('kubernetes deployment')
ORDER BY score DESC;

MySQL ranks by TF-IDF relevance automatically.

Boolean mode#

SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('+kubernetes -docker +deployment' IN BOOLEAN MODE);

+ means must include
- means must exclude
No prefix means optional (boosts rank)

MySQL limitations#

No stemming by default (searching "deploy" won't match "deployment")
Minimum word length is 3 characters (configurable via ft_min_word_len)
Stop word list is fixed unless you customize it
No phrase proximity search
No field weighting (title vs body)

Trigram indexes for fuzzy matching#

When users misspell words, stemming does not help. Trigram indexes do.

PostgreSQL pg_trgm#

CREATE EXTENSION pg_trgm;

CREATE INDEX idx_articles_title_trgm ON articles
  USING GIN(title gin_trgm_ops);

-- Fuzzy search
SELECT title, similarity(title, 'kuberntes') AS sim
FROM articles
WHERE title % 'kuberntes'
ORDER BY sim DESC;

The % operator finds similar strings. "kuberntes" matches "kubernetes" because they share enough trigrams.

Combining tsvector + trigram#

Use tsvector for precise full-text search. Use trigram as a fallback for "did you mean?" suggestions.

-- Primary search
SELECT * FROM articles WHERE search_vector @@ to_tsquery('english', 'kuberntes');
-- No results? Fall back to trigram
SELECT title FROM articles WHERE title % 'kuberntes' ORDER BY similarity(title, 'kuberntes') DESC LIMIT 5;

Search result highlighting#

Show users where their query matched.

PostgreSQL ts_headline#

SELECT title,
  ts_headline('english', body, to_tsquery('english', 'kubernetes'),
    'StartSel=<mark>, StopSel=</mark>, MaxFragments=3, MaxWords=30')
  AS snippet
FROM articles
WHERE search_vector @@ to_tsquery('english', 'kubernetes');

This returns fragments of the body with matches wrapped in mark tags.

MySQL snippet equivalent#

MySQL does not have built-in highlighting. You must implement it in application code or use a search engine.

Building autocomplete#

Autocomplete needs sub-50ms response times.

Approach 1 — Prefix search with tsvector#

SELECT DISTINCT title FROM articles
WHERE search_vector @@ to_tsquery('english', 'kube:*')
LIMIT 10;

Fast with a GIN index, but returns full documents, not suggestions.

Approach 2 — Separate suggestions table#

CREATE TABLE search_suggestions (
  term TEXT PRIMARY KEY,
  frequency INT DEFAULT 1
);

CREATE INDEX idx_suggestions_trgm ON search_suggestions USING GIN(term gin_trgm_ops);

-- Query as user types
SELECT term FROM search_suggestions
WHERE term LIKE 'kube%'
ORDER BY frequency DESC
LIMIT 10;

Populate this table from search logs, article titles, and tags.

Approach 3 — Materialized view#

CREATE MATERIALIZED VIEW search_terms AS
SELECT DISTINCT unnest(string_to_array(title, ' ')) AS term
FROM articles;

REFRESH MATERIALIZED VIEW CONCURRENTLY search_terms;

When to use Elasticsearch instead#

PostgreSQL full-text search is excellent up to a point. Consider Elasticsearch or OpenSearch when:

Factor	PostgreSQL FTS	Elasticsearch
Dataset size	Up to ~10M documents	Billions of documents
Query complexity	Basic ranking, phrase, prefix	Fuzzy, synonym, multi-language, geo
Relevance tuning	Limited (weights A-D)	Highly customizable scoring
Faceted search	Manual with GROUP BY	Built-in aggregations
Real-time indexing	Immediate (same transaction)	Near real-time (~1s refresh)
Operational cost	Zero (it is your database)	Separate cluster to manage
Consistency	Strongly consistent	Eventually consistent

Rule of thumb: if you are already on PostgreSQL and have fewer than 5M searchable documents, start with built-in FTS. Add Elasticsearch when you outgrow it.

Performance tips#

Always use GIN indexes — GiST is slower for text search
Store tsvector as a generated column — avoid computing it at query time
Limit result sets — use LIMIT even with indexes
Use websearch_to_tsquery for user input — it handles quoted phrases and operators safely
Vacuum regularly — GIN indexes need maintenance
Partition large tables — search within partitions for faster results

Visualize your search architecture#

Map out your search pipeline from ingestion to query — try Codelit to generate an interactive architecture diagram.

Key takeaways#

PostgreSQL tsvector/tsquery is production-ready full-text search with stemming, ranking, and indexing
MySQL FULLTEXT works for basic search but lacks stemming and field weighting
Trigram indexes handle typos and fuzzy matching — combine with tsvector for the best experience
Highlight results with ts_headline to show users where matches occurred
Autocomplete needs a dedicated approach — prefix search, suggestions table, or materialized views
Elasticsearch is worth the operational cost when you outgrow database-native search
Start simple — built-in database search covers more use cases than most teams realize

{ }

Explore the Discord architecture interactively

Try it →

Article #405 in the Codelit engineering series. Explore our full library of system design, infrastructure, and architecture guides at codelit.io.

Try it on Codelit

GitHub Integration

Paste any repo URL to generate an interactive architecture diagram from real code

Build this architecture →

Comments

API design

Build this architecture

Generate an interactive architecture for Full in seconds.

Try it in Codelit →

Full-Text Search Implementation — PostgreSQL, MySQL, and Beyond

Why LIKE queries break down#

PostgreSQL full-text search#

tsvector and tsquery#

Adding a search column#

Searching#

Phrase search#

Prefix search (autocomplete)#

MySQL FULLTEXT search#

Creating a FULLTEXT index#

Natural language mode#

Boolean mode#

MySQL limitations#

Trigram indexes for fuzzy matching#

PostgreSQL pg_trgm#

Combining tsvector + trigram#

Search result highlighting#

PostgreSQL ts_headline#

MySQL snippet equivalent#

Building autocomplete#

Approach 1 — Prefix search with tsvector#

Approach 2 — Separate suggestions table#

Approach 3 — Materialized view#

When to use Elasticsearch instead#

Performance tips#

Visualize your search architecture#

Key takeaways#

Comments

Related articles

API Backward Compatibility: Ship Changes Without Breaking Consumers

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

Try these templates

Search Engine Architecture

Multiplayer Game Backend

Google Search Engine Architecture

Build this architecture

Full-Text Search Implementation — PostgreSQL, MySQL, and Beyond

Why LIKE queries break down#

PostgreSQL full-text search#

tsvector and tsquery#

Adding a search column#

Searching#

Phrase search#

Prefix search (autocomplete)#

MySQL FULLTEXT search#

Creating a FULLTEXT index#

Natural language mode#

Boolean mode#

MySQL limitations#

Trigram indexes for fuzzy matching#

PostgreSQL pg_trgm#

Combining tsvector + trigram#

Search result highlighting#

PostgreSQL ts_headline#

MySQL snippet equivalent#

Building autocomplete#

Approach 1 — Prefix search with tsvector#

Approach 2 — Separate suggestions table#

Approach 3 — Materialized view#

When to use Elasticsearch instead#

Performance tips#

Visualize your search architecture#

Key takeaways#

Comments

Related articles

API Backward Compatibility: Ship Changes Without Breaking Consumers

Batch API Endpoints — Patterns for Bulk Operations, Partial Success, and Idempotency

Circuit Breaker Implementation — State Machine, Failure Counting, Fallbacks, and Resilience4j

Try these templates

Search Engine Architecture

Multiplayer Game Backend

Google Search Engine Architecture

Build this architecture