How Useful is PostgreSQL Full Text Search and Phrase Search!

Development Sep 4, 2020

In this article, we will learn how to use PostgreSQL Full Text Search and Phrase Search in PostgreSQL.

Let's first discuss about how we are using PostgreSQL LIKE and ILIKE operator to query data by using basic pattern matching technique.

We are pretty much familiar with pattern matching technique, which has been part of the standard SQL since the beginning.


SELECT column_name FROM table_name WHERE column_name LIKE 'pattern';

This will return rows where column_name matches the pattern. This technique is called pattern matching.

We construct a pattern by combining a string with wildcard characters and use the LIKE or NOT LIKE operator to find the matches. PostgreSQL provides two wildcard characters.

Percent (%) for matching any sequence of characters (as in LIKE 'a%' to search for columns that start with "logic").
Underscore (_) for matching any single character (as in LIKE '_o%' to find any values that have an "o" in the second position).
we can use ILIKE to ignore cases.

This is pretty much simple and we all are familiar with. But PostgreSQL is an RDBMS capable of much more than simply storing and retrieving data.

What is PostgreSQL Full Text Search and Phrase Search?

Using Full Text Search , we can build a more powerful text search engine without introducing extra dependencies on more advanced tools. When we have to build a web application, we are often asked to add search. Full Text Search is used by search engines, shops, and many other websites all around the world.

Full Text Search retrieves documents, which are database entities containing textual data, that don’t perfectly match the search criteria. This means that when a user searches for “tom and jerry”, for example, an application backed by Full Text Search is able to return results which contain the words separately (just “tom” or “jerry”), contain the words in a different order (“toms and jerries”), or contain variants of the words (“tom” or “jerry”). This gives applications an advantage in guessing what the user means and returning more relevant results faster.

How to implement Full PostgreSQL Text Search and Phrase Search?

Let's assume that we have a database that stores tables of users for our application. So we are focusing on users table with some data in our database to implement FTS.


/* Select all records from the 'first_name', 'last_name' and 'email' 
columns */

SELECT first_name, last_name, email FROM users;

This will return all the records of first_name, last_name and email from users table.

Now, let's implement full text search query using to_tsvector and to_tsquery .

to_tsvector for creating a list of tokens (the tsvector data type, where ts stands for "text search").
to_tsquery for querying the vector for occurrences of certain words or phrases.


/* Compute the vector value and selects all records from the 'first_name' column THAT MATCH the string 'john' */

SELECT first_name, last_name, email FROM users
WHERE to_tsvector(first_name) @@ to_tsquery('john');

This will return all the records of users table that matches the first_name column with string 'john'.


/* Compute the vector value and selects all records from the 'first_name'  and 'last_name' column THAT MATCH the string 'john' */

SELECT first_name, last_name, email FROM users
WHERE to_tsvector(first_name || '' || last_name) @@ to_tsquery('john');

Above query will return all the records of users table that matches the 'john' string with first_name and last_name columns. We can the to_tsvector function for multiple column as well(like above example).

In all of the above queries of FTS, its calculating a vector value on the fly. But now we are pre-calculating the vector value by creating a document column and adding it to our existing table users. Due to this it will return more faster results set of data.


/* Creates a new 'document' column on the 'users' table and sets 'document' equal to te pre-computed vector value */

ALTER TABLE users
    ADD COLUMN document tsvector;
UPDATE users
   SET document = to_tsvector(first_name || '' || last_name || '' || email);

Now, let's create a query using document column to find matched string.


/* Select all records from the 'document' column THAT MATCH the tsquery for string 'john' */

SELECT first_name, last_name, email FROM users
WHERE document @@ to_tsquery('john');

We can run a speed test to see the difference between the queries.


/* SPEED TEST */
/* QUERY #1: FULL TEXT WITH DYNAMICALLY-CALCULATED VECTOR VALUES */

EXPLAIN ANALYZE SELECT first_name, last_name, email 
FROM users
WHERE to_tsvector(first_name || '' || last_name || '' || email) @@  to_tsquery('john');
    
/* QUERY #2: FULL TEXT QUERY WITH PRED-CALCULATED VECTOR VALUES STORED IN A NEW 'document' COLUMN*/

EXPLAIN ANALYZE SELECT first_name, last_name, email
FROM users
WHERE document @@ to_tsquery('john');

The reason why full-text search works really fast is because of the tsvector data type, which works as an index for the document's context. That being said, the cost of the operation is generating this index, which is something you would normally need to do only once.

And even without tweaking, you can still use tsvector and tsquery out-of-the-box to very decent results for relatively simple applications, with a lot more flexibility than you would have by searching patterns with LIKE, and —in many cases— with simpler, cleaner code.