How to import data from MS SQL Server into Elasticsearch 1.6

Posted on Updated on

Problem:

Need to provide the analytics and visualization for audit log data which is stored in relational database.

Solution

One of the solution to this problem is to visualize the data in open source tool like kibana . But kibana uses the elasticsearch for search and storage purpose.

So that need to import selected records from relational database into the elasticsearch 1.6. The new index will be created in elasticsearch for this data and it will be used by kibana.

Prior to elasticsearch 1.6 the river plugin was available for this purpose but it is now deprecated.

But to solve the same problem another standalone java utility known as elasticsearch – jdbc is available.

Here I am going to tell you how to use this utility through docker so whenever you need it. it would be only three steps process for you i.e clone it, build image and start the container with parameter.

Prerequisite:

  1. Ubuntu 14.04
  2. Install Docker Host
  3. Install elasticsearch
    docker run -d -p 9200:9200 -p 9300:9300 elasticsearch 
    
  4. Install Kibana

Step 1: Check out the docker file https://github.com/vinayakbhadage/data-importer

git clone https://github.com/vinayakbhadage/data-importer.git

Step 2: Change the required parameter from this file dataimport.sh as mentioned here

Step 3: Build the images from Dockerfile

docker build -t data-importer .

Step 4: Run the data-importer by setting following parameter

1.LAST_EXECUTION_START=”2014-06-06T09:08:00.948Z”

This date time used to import the data from the log table of your database. All records in that table with timestamp column value greater than this will be imported in Elastic search.

2.INDEX_NAME=** Provide the value **

This one is index name for elasticsearch.

3.CLUSTER=** Provide the value **

Provide the elasticsearch cluster name.

4.ES_HOST=** Provide the value **

Provide the elastic search host name or IP address.

5.ES_PORT=”9300″

Provide the elastic search host port number.

6.SCHEDULE=”0 0/10 * * * ?”

Default interval for data-importer is 10 min. this is Quartz cron trigger syntax.

7.SQL_SERVER_HOST=”Provide the value”

It should be sql server database IP or hostname.

8.DB_NAME=”Provide the value”

It should be sql server database name.

9.DB_USER_NAME=”Provide the value”

It should be sql server user name, here server authentication is required.

10.DB_PASSWORD=”Provide the value”

It should be sql server user password, here server authentication is required.

Note: Please change the environment variable as per your requirement

docker run -d --name data-importer -e LAST_EXECUTION_START="2014-06-06T09:08:00.948Z" \
  -e INDEX_NAME="myindex"  -e CLUSTER="elasticsearch" -e ES_HOST="myeshost" \
  -e ES_PORT="9300" -e SCHEDULE="0 0/10 * * * ?" -e SQL_SERVER_HOST="mydb" \
  -e DB_NAME="mydb" -e DB_USER_NAME="myuser" -e DB_PASSWORD="find-out" data-importer

Lastly checkout the status of elasticsearch index then you can find data over there.

Advertisements

3 thoughts on “How to import data from MS SQL Server into Elasticsearch 1.6

    adi said:
    July 17, 2015 at 9:20 am

    Hey. Great Post !
    So i am currently trying to use jdbc-importer (https://github.com/jprante/elasticsearch-jdbc) . i am fairly new to both postgres and elasticsearch. I downloaded the importer and tried running a simple .sh script but for some reason neither data is in my elasticsearch nor displaying an error when i run it. Following is my shell script:

    #!/bin/sh

    DIR=”$( cd “$( dirname “${BASH_SOURCE[0]}” )” && pwd )”
    bin=${DIR}/../bin
    lib=${DIR}/../lib

    echo ‘{
    “type” : “jdbc”,
    “jdbc” : {
    “url” : “jdbc:postgresql://localhost:5432/dbname”,
    “user” : “dbuser”,
    “password” : “dbpass”,
    “sql” : “select id from fb_ad_sets where ad_account_id = 29”,
    “index” : “myindex”,
    “type” : “mytype”
    }
    }
    ‘ | java \
    -cp “${lib}/*” \
    -Dlog4j.configurationFile=${bin}/log4j2.xml \
    org.xbib.tools.Runner \
    org.xbib.tools.JDBCImporter

    After running it i tried querying for it by a simple:
    GET /myindex/_refresh

    Status is 404. 😀

    Any help would be awesome. Sorry about the long post as well.

      vinayakbhadage responded:
      July 21, 2015 at 6:47 am

      check jdbc.log file inside logs folder.
      and try GET /myindex/_stats

      Elasticsearch Refresh API is http post call ..

      Regargds,
      -Vinayak

    Ashish said:
    October 2, 2015 at 5:59 am

    This gave me a right direction.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s