Month: August 2016

How to do horizontally scalable data migration using Talend ETL and AKKA Cluster

Posted on

Key Objectives:

  1. Need to do data migration from MSSql to Elasticsearch and Couchbase
  2. Need to migrate millions of record from MSSql.
  3. Need to support highly scalable parallel processing.
  4. Need to support horizontal scaling.
  5. Need to complete data migration in predefined time window.

Solution:

The solution consist of two phases, In phase one need to have data migration Job, here you can use Talend ETL for big data. This Talend ETL Job is parameterized for data migration details like which record from MSSQL as well as Connection details for MSSQL, Elastic search and Couchbase etc …

In phase two, need to run the Talend ETL job in environment which should scale and run it parallely along with this one need to orchestrate this process by passing the required parameter or messages to this Talend Job.

This is achieved by Cluster enable AKKA’s Actor Model, in which Master actor knows which records need to be migrated from MSSQL server and will assign the each record id to Worker Actors which eventually run the Talend ETL JOB

Conclusion:

This is going to be definitely useful for migrating large amount of data parallely with horizontally scalable cloud infrastructure like AWS from different data sources using one of the best open source stack.