Why, When and How to start with Elasticsearch

The first question that might come to your mind,” What is Elasticsearch and why do we need this in our application?”

It is not at all necessary or required to use elastic search in any of your applications. If your application needs any specific requirement which is fulfilled by elasticsearch, then I guess you should not give a second thought, just go with this. The next section will answer your question and you will know what is elasticsearch.

What is Elasticsearch?

As the name is suggesting something, it is related to the search. But you are misguiding your thoughts if you are thinking that for any searching purpose directly use elastic search.


Elasticsearch is actually built on top of Apache Lucene, which is a search engine library with higher text search performance. So if your requirements do not contain any high level of performance complexity related to search results, AVOID using elasticsearch. Because normal searching on fields or records is not difficult for any latest storage technologies using proper indexing.

Now you know that elasticsearch is at least one level up than normal searching techniques. Yes, it is. It has a distributed architecture that allows horizontal scaling by adding more nodes and taking advantage of the extra hardware. It supports thousands of nodes for processing terabytes of data. It's horizontal scaling also means that it has a high availability by re-balancing the data if ever any node fail. Your straightforward next question, “Okay cool, but how would I know if I am actually in need of elasticsearch or not?”

This is very important; you will easily know what it is and how to use it, but you should know WHEN to use it.

1. You should have a large amount of data. An application should have at least 2-3 GB of data so that you can think of using Elasticsearch.2. How complex are your search requirments? If your application is having requirements like prefix searching, dynamic search suggestions, AND operation search on many keywords over large data, or any other type of search complexity, then you can consider going with Elasticsearch.3. The most important thing is the infrastructure. If you have terabytes of data, then you should consider for at least 32 GB memory and as many CPU cores as possible. You can still run it on a resource-constrained system and get decent performance (assuming your data set is not huge). For an upcoming example in this article, a system with 4GB memory and a single CPU core will suffice.

You can run Elasticsearch on all major operating systems (Mac OS, Linux, and windows). Now you know what is elasticsearch, why to use it and when to use it. The only question remains, “How to start using Elasticsearch?”

How to integrate Elasticsearch in Node JS Application

I will not go deep into complex queries or features of Elasticsearch. For that, you can refer to its official documentation and easily understand it. Here I will consider the major steps and basic queries that are required to start and run Elasticsearch. We will see:• Terminology• Installing Elasticsearch• Check if a cluster is live and running• Create index• Define the structure of data• Add data• Update data• Query data• Delete Index

Terminology

Index: We can say that in elasticsearch everything is added to an index, which is like our Database. Also when we add data to an index, the data is tokenized and each token is actually indexed for searching.
Type: While inserting data in an index, first we need to specify its type. This actually is the table of the database (index).
Documents: Documents are the rows in each type of index.
Properties: Properties are the columns in each type of document.

An ElasticSearch cluster can contain multiple Indices (databases), which in turn contain multiple Types (tables). These types hold multiple Documents (rows), and each document has Properties (columns).

Installing Elasticsearch

Elasticsearch is written in Java and it needs java libraries to run. Just see that you have JRE installed on your system.

Elasticsearch provides an official module for Node.js, called elasticsearch. Install npm package using the following command.


npm install –save elasticsearch

Then, you can import the module in your script as follows:


const elasticsearch = require("elasticsearch")

For further reading about installation and troubleshooting, you can visit the documentation.

Check if cluster is live and running

All operations you can do with Elasticsearch can be done via RESTful APIs. Elasticsearch uses port 9200 by default. To make sure you are running it correctly, head to localhost:9200 in your browser, and it should display some basic information about your running instance.


const esClient = new elasticsearch.Client({
    host: '127.0.0.1:9200', 
    log: 'error'
})

esClient.ping({
    requestTimeout: 30000, // 30 seconds
}, (err) => {
    if (err)
        console.error(`Error connecting to the es client: ${err}`)
    else
        console.log(`Success! ElasticSearch cluster is up!`)
})

Create index

Now you have your index cluster ready to do elastic operations. Create index is the first basic operation which is required to be done.


esClient.indices.create({
    index: "indexName"
}).then((resp) => {
    console.log(`Index Created with response: ${resp}`)
}, err => {
    console.log(`Error creating Index: ${err}`)
})

Now you have your index ready. It is a good practice to check if the created index exists or not.


esClient.indices.exists({
    index: "indexName"
}).then((resp) => {
    console.log(resp)
}, err => {
    console.log(`Index does not exist: ${err}`)
})

Define the structure of data

This is one of the most important parts. As discussed we can add multiple types(tables ) inside one index. As per the requirement, we need to add the mapping object in the body section. This mapping lets Elasticsearch know about the type of indexing it needs to do for each property. This step can also be clubbed with creating an index step, in a single call. Elasticsearch has 2 types of tokenization technique with string-based properties:
Text
  ◦ Text type specifies that the value needs to be tokenized word by word. Hence, each word inside the string should be indexed separately.
Keyword
  ◦ The keyword type specifies that the value need not be tokenized word by word. It should be considered as a single word(Token) and indexed accordingly. We can also mention “index”: false if we do not want to index any property. Find more about mapping here.


esClient.indices.putMapping({
    index: "indexName",
    type: "typeName",
    body: {
        "properties": {
            "property1": {
                "type": "text",
            },
            "property2": {
                "type": "date"
            },
            "property3": {
                "type": "keyword",
            },
        }
  }
}).then((resp) => {
    console.log(resp)
}, err => {
    console.log(`Error mapping existing index: ${err}`)
})

Add data

Now we are perfectly set up for adding data into our index. The following example shows to index a single document, although we can add multiple documents together using a bulk method.


esClient.index({
    index: "indexName",
    type: "typeName",
    id: "_id",
    body: {
        "property1": "value1",
        "property2": "value2",
        "property3": "value3",
    }
}).then((resp) => {
    console.log(resp)
}, err => {
    console.log(`Error indexing document: ${err}`)
})

Update data

We can modify added document using following method.


esClient.update({
    index: "indexName",
    type: "typeName",
    id: "_id",
    body: {
        "property2": "newValue2",
        "property3": "newValue3",
    }
}).then((resp) => {
    console.log(resp)
}, err => {
    console.log(`Error updating document: ${err}`)
})

Query data

Now we are ready to do that operation for which we used Elasticsearch, getting queried data.

There are many ways to do a query in elasticsearch based on requirements. One can do a match query, multi-match query, match phrase, prefix match, aggregate results, get suggestions from a tokenized property, etc.

In this example, I am getting results with exact matching tokens (constant score), having filters with “must” array and “should” array.

Must: All the term/terms conditions should match

Should: Not necessary to match term/terms condition, but if the query is scoring based then “should” matching term/terms will help to score.

Note: here term is used for exact matching of value and terms is used for matching any of the values from a given set.


esClient.search({
    index: "indexName",
    type: "typeName",
    from: 0,
    size: 100,
    body: {
        "query": {
            "constant_score": {
                "filter": {
                    "bool": {
                        "must": [
                            {
                                "term": {
                                    "property1": "value1"
                                }
                            },
                            {
                                "terms": {
                                    "property2": [
                                        "possibleValue1",
                                        "possibleValue2",
                                        "possibleValue3"
                                    ]
                                }
                            },
                        ],
                        "should": [
                            {
                                "term": {
                                    "property1": "falseValue"
                                }
                            },
                            {
                                "term": {
                                    "property3": "value3"
                                }
                            },
                        ]
                    }
                }
            }
        },
        "sort": [
            { "property2": { "order": "desc" } }
        ],
    }
}).then((resp) => {
    console.log(resp)
}, (err) => {
    console.log(`Error getting query results: ${err}`)
})

Delete Index

If we wish to delete any type of index, then we can use the below method. Deleting all indexes is possible by setting option index value to “_all”.


esClient.delete({
    index: "indexName",
    type: "typeName",
    id: "_id",
}).then((resp) => {
    console.log(resp)
}, err => {
    console.log(`Error deleting document: ${err}`)
})

Useful Links

Reference to Elasticsearch and its features

Implementation and use cases guide

Documentation of the Node.js client