Blue-Green deployment using Spinnaker, Packer and Jenkins on AWS

Posted on

Key Objectives:

  1. Need to create immutable server images of windows or linux server.
  2. Need to achieve desire state using configuration management tools like Chef, Powershell etc..
  3. Need to do deployment on public cloud provider like AWS.
  4. Need to deploy web application without downtime.
  5. Need to support the rollback of deployed web application.

Solution:

To manage the continuous delivery with blue green deployment using spinnaker need to have delivery pipeline which consist of build , bake and deploy phase.

Build Phase

Spinnaker provide integration with Jenkins using Igor micro service. Build phase is nothing but Jenkins Job which consists of following stages.

  1. Check out the source code from git/svn tagged branch.
  2. Build the artifact as per build management tool like MsBuild, Maven, Gradle, SBT etc..
  3. Publish the artifact to artifact repository like AWS S3, Apache Artifactory or Nuget Server as per requirement.
  4. Need to pass the output of the build phase like artifact url to the spinnaker which can be used in bake phase. The supported format is json which is accessed using spinnaker expression language

Bake Phase

Baking immutable server images a.k.a golden images for target platform like AWS, GCP etc.. is done by Packer. By default Packer is provided by spinnaker which helps to bake images for Linux server only if want to bake windows server image then need to have packer as a separate component which can be executed through Jenkins. Following steps need to perform in bake phase or pipeline.

  1. Execute Packer Jenkins Job from spinnaker with parameters like artifact url, AWS credential, instance type etc..
  2. Packer is supporting different types of provisioners like chef-client, shell or for windows server Powershell which help to bring the desire state of the immutable server image. e.g. in case of Web Server AMI need to have base image of windows server on top of it need to install Dot.NET, IIS and configure WEB Application.
  3. Once the desire state is achieved then Packer will create AMI on AWS and produced AMI ID as output which can be used for deployment so that need to pass it to Spinnaker.

Deploy Phase

Spinnaker provides the deploy stage in delivery pipeline with different strategies to do the deployment. Red/Black or Blue-Green deployment is one of the strategy which helps to do the deployment without down time.

On AWS to achieve Blue-Green deployment need to have Elastic load balancer (ELB), Auto scale group (ASG) with launch configuration with immutable server image (AMI Id) produced by bake phase. So that every new deployment will create new ASG with launch configuration will point pre-baked AMI ID and this ASG is attached to ELB. Then need to wait to pass health check of ASG launched instances, once health check is successful then downscale old ASG with min-max configuration to zero.

This complicated thing is smoothly handled by Spinnaker. DevOps only need to tune the health check parameter of ELB as per application need.

Conclusion

One can target continuous delivery by following Blue-Green deployment for web applications using Spinnaker, Jenkins and Packer for public cloud provider like AWS, GCP etc. Even Spinnaker integration with Jenkins can help to manage non web resources(databases, platform specific resources) of cloud providers using Terraform.

References

How to do horizontally scalable data migration using Talend ETL and AKKA Cluster

Posted on

Key Objectives:

  1. Need to do data migration from MSSql to Elasticsearch and Couchbase
  2. Need to migrate millions of record from MSSql.
  3. Need to support highly scalable parallel processing.
  4. Need to support horizontal scaling.
  5. Need to complete data migration in predefined time window.

Solution:

The solution consist of two phases, In phase one need to have data migration Job, here you can use Talend ETL for big data. This Talend ETL Job is parameterized for data migration details like which record from MSSQL as well as Connection details for MSSQL, Elastic search and Couchbase etc …

In phase two, need to run the Talend ETL job in environment which should scale and run it parallely along with this one need to orchestrate this process by passing the required parameter or messages to this Talend Job.

This is achieved by Cluster enable AKKA’s Actor Model, in which Master actor knows which records need to be migrated from MSSQL server and will assign the each record id to Worker Actors which eventually run the Talend ETL JOB

Conclusion:

This is going to be definitely useful for migrating large amount of data parallely with horizontally scalable cloud infrastructure like AWS from different data sources using one of the best open source stack.

Why need to choose GraphQL over REST/OData in a Web API Application development

Posted on Updated on

Problem

  • Need to have single HTTP endpoint for managing CRUD operation and performing custom actions.
  • Need to have meta data about available services and its data contracts.
  • Need to fetch linked resources(domain objects) of application in single call, helps to avoid multiple custom endpoints and round trips.
  • Need to have backward compatibility for services and its data contracts.
  • Need to have custom query capability with filtering, sorting, pagination and selection set for targeted domain objects.
  • Need to go away from a SOAP web services.

Solution

One of the solution to resolve above problems is GraphQL specification which open sourced by Facebook. GraphQL is an application layer query language which helps the client application to fetch or mutate(update/delete/insert) the application domain objects. As a web api application need to support Command Query Responsibility Segregation for many reasons.

To understand the GraphQL in simple words, Let us compare a RDBMS terminology with GraphQL,

  • RDBMS gives single endpoint for a database server, GraphQL also advocate the single endpoint irrespective of underlying transport protocol like HTTP, UDP, FTP etc…
  • RDBMS has schema which contains tables and store procedures. Same way GraphQL also defines schema, instead of tables it proposes Type System which helps to define data contract etc..
  • Like Store procedure it has queries and based on what type of operation it perform on server, their are two types one which is used to read or fetch the data known as “query” and other which is used to update or change the data known as “mutation”.
  • Like SQL is query language for RDBMS, GraphQL has Graph query language which is like JSON without value.
  • RDBMS is having normalized data model with relationships using foreign keys, here we need to define product data model in graph manner irrespective of how we store it.

To know more about GraphQL please go through its specification.

The implementation architecture overview of GraphQL in an application is as shown below:

  • GraphQL server endpoint which is a simple HTTP endpoint which accepts HTTP Post request with GraphQL message as a payload.
  • It resolves the query depends on its type (either mutation or query) and produce the response in JSON format using underlying infrastructure of Business Logic and Data Access Layer.
  • Any type of a client application can communicate with GraphQL server endpoint over HTTP.
  • GraphQL also publishes its meta data through introspection queries.
  • GraphiQL is Html5 and Javascript based IDE which helps to fetch and explore meta data about services.
  • It is also used as test harness, for generating and evaluating GraphQL queries and mutation.

Why GraphQL over OData and REST

Choose the GraphQL over OData and REST, because of following reasons:

  • Need to support the linked resource, To give an example , let say In university we have a students which belongs to department, We need to get student details along with department details, To achieve same thing need to make multiple calls to server if we are using OData or REST.
  • Another case like a client application wants to get student’s course details, to fulfill this demand, need to define custom REST or OData endpoint, GraphQL avoids the need of such custom endpoint, as we define a product data model as a graph, so that get the data as link exist between the nodes of graph.
  • To support the backward compatibility of services and data contract, Using OData or REST need to maintain URL or DataContract versing also need to do message routing in some cases. But using GraphQL we don’t need to do it. As client knows what data it is consuming using GraphQL request and from single GraphQL endpoint.
  • OData apply convention for data fetching as compare to REST, same thing is done in GraphQL using Type System, Arguments and Selection Sets.
  • In OData meta data is published in EDM format and REST is struggling to define the meta data standard as WADL or Swagger and many more. Here GraphQL is published meta data in defined format through introspection api and also provided document explorer for it.
  • WS-Trust, WS-Federation and WS-Security is already addressed by OAuth 2.0 and Open Id Connect 1.0 for API so one can use it for GraphQL as well.

Samples GraphQL queries and mutations(commands)

1. Sample request to get student by id from a graphql endpoint as shown below:

Request:

    POST http://your-grapghql-hosting/api/graphql? HTTP/1.1     
    {
        "query": "query($id: Long!){
                student(id: $id) 
                {
                  status
                  gender 
                  name 
                  { 
                    title      
                    firstName      
                    lastName    
                  }  
                }
         }",
        "variables": "{\"id\":1}"
    }

Response:

    HTTP/1.1 200 OK

    {
      "data": {
        "student": {
          "status": "Active",
          "gender": "Mail",
          "name": {
            "title": null,
            "firstName": "Vinayak",
            "lastName": "Bhadage"
          }
        }
      }
    }

2. Get student details along with department as linked resource

Reuqest

    POST http://your-grapghql-hosting/api/graphql? HTTP/1.1     
    {
        "query":"query($id: Long!){
                studentWithDepartment: student(id: $id) 
                {
                      status
                      gender 
                      name 
                      { 
                        title      
                        firstName      
                        lastName    
                      }  

                      department {
                              name
                              id
                        }
                      }
                }",
        "variables":"{\"id\":1}"
     }

Response

HTTP/1.1 200 OK 
{
  "data": {
    "studentWithDepartment": {
      "status": "Active",
      "gender": "Mail",
      "name": {
        "title": null,
        "firstName": "Vinayak",
        "lastName": "Bhadage"
      },
      "department": {
        "name": "Computer Engineering",
        "id": 1
      }
    }
  }
}

3. Create studnet using mutation(command) GraphQL Queries

Reuqest

POST http://your-grapghql-hosting/api/graphql? HTTP/1.1 
{
    "query":"mutation newStudent($input:StudentInput!){
          createStudent(studentInput:$input)  {   
              id    
              gender    
              status    
              profileImage    
              department{      
                  id      
                  name    
            } 
          }
      }",
      "variables":"{\"input\": {status:\"Active\", gender:\"Male\",departmentId:1,profileImage:\"hello\"}}"
 }

Response

HTTP/1.1 200 OK 
{
  "data": {
    "createStudnet": {
      "id": 15557393526030336,
      "gender": "Male",
      "status": "Active",
      "profileImage": "hello",
      "organization": {
        "id": 1,
        "name": "Root"
      }
    }
  }
}

Conclusion

GraphQL helps to keep Web API application more robust, extensible, updo the date documentation of services and backword compatible without breaking Web API consumers.

References:

  1. graphql-a-data-query-language
  2. graphql-dotnet
  3. graphiql IDE and Documentation
  4. graphql specification

How to do Continuous Integration and Continuous deployment for a target platform

Posted on

Problem

On demand an application should be pushed to a production environment. An application must be validated, tested and tagged before release. Need to provide change history and rollback capability.The deployment process should be applicable to any target platform e.g. public or private cloud. An application should be released without downtime.

Solution:

The solution consists of two phases one is Continuous Integration Pipeline and other is Continuous Deployment Pipeline.

1. Continuous Integration

The steps involve in continuous integration pipeline using Jenkins (build server) are shown below :

  1. Checkout the source code from central repository.
  2. Build and compile source code.
  3. Run static code analysis.
  4. Run unit tests.
  5. Build all artifacts.
  6. Deploy release on dev environment.
  7. Run functional test suite.

If all tests pass (or manually triggered) then promote the build to QA environment and do the following.

  1. Run smoke and sanity tests.
  2. Run all behavior driven acceptance tests.
  3. On success, tag the branch for stage promotion based on convention set.

2. Continuous deployment

The steps involve in continuous deployment pipeline using Jenkins are shown below:

  1. Checkout the source code of tagged release branch (which was tested on stage earlier) from Git.
  2. Build the source code using tools like MSBuild, Maven, Gradle (used to build the application).
  3. Publish the artifact to AWS S3 bucket and update it’s URL in Chef server’s data bag.
  4. AWS S3 bucket is used as artifact storage.
  5. Pre-bake the machine image for target platform using Packer (it is used to build machine images for various target platforms e.g. AWS, GCE, virtual box …etc.) and bring it to the desire state with chef-client.
  6. Deploy pre-baked machine image to target platform using Terraform (it is used across multiple public or private cloud providers for infrastructure provisioning) and update infrastructure information in Consul (used for service discovery and configuration store as key/value pair).
  7. Chef server is loaded with required environment’s roles, cookbooks and data-bags for deployment.
  8. AWS is used as target platform for deployment.
  9. Consul server is deployed on AWS which help’s in service discovery.

For reference

An AWS deployment architecture of two-tire web application with service discovery as shown below:

  1. VPC is used to setup private data center on AWS region which consist of public and private subnets for two or more availability zones.
  2. Internet Gateway to allow public access or services.
  3. Route 53 to manage DNS entry for ELB (elastic load balancing).
  4. Highly available AWS Elastic load balancing.
  5. Autoscale group to manage the web application’s deployment using launch configuration with pre-baked AMI and instance type.
  6. NAT instance to control the public access for private subnet.
  7. Database cluster e.g. MongoDB sharded cluster as storage.
  8. Autoscale group to manage Consul for service discovery and deployment configuration.
  9. Cloudwatch alarm for autoscaling web app.
  10. Cloudwatch for autoscaling web app.
  11. AWS IAM user for managing AWS s3 bucket for Elasticsearch snapshots.
  12. AWS S3 bucket for storing elasticsearch snapshot.

Blue-green Deployments and Roll-backs

There are multiple strategy for managing blue-green or canary deployments on AWS as target platform.
Here a blue-green deployment is achieved by managing an auto-scale group by executing steps given below using Terraform.

  1. Always create new auto-scale group with launch configuration pointing to latest pre-backed AMI from consul then attach an existing elastic load balancer to it.

  2. Update the existing auto-scale group by detaching it from the existing load balancer and give some cool-down period and remove it.

    For rollback need to refer consul for previously deployed pre-backed AMI and repeat the steps 1 and 2 as mentioned above.

Conclusion

This continuous integration and deployment strategy is based on open-source stack. It is also useful for on premise private cloud like openstack, VMware etc. as well as with public cloud provider like AWS, GCE, digital ocean etc. The tools or technology like Packer, Terraform, Consul, Jenkins and Chef truly helps to achieve infrastructure as code and made DevOps life simple :).

How to setup a MongoDB Sharded cluster using docker with skydns

Posted on Updated on

Problem: To setup MongoDB Sharded cluster on development environment, need to have replica set, config server and Mongo Router on single machine.

Solution: One of the solution to setup MongoDB Sharded cluster on single machine is Docker. Docker MongoDB container are hosted on single host with isolation. The MongoBD sharded cluster using docker container is as shown below:

Prerequisite

  1. Ubuntu server 14.04
  2. Install Docker host

Install Skydock and skydns for docker service discovery

Step 1: Check ip address docker0, it is docker networking gateway.

 ifconfig docker0

If docker0 bridge ip address is 172.17.0.1 then go ahead else use whichever is assigned.

Step 3: Edit /etc/default/docker

DOCKER_OPTS="--bip=172.17.0.1/16 --dns=172.17.0.1 --dns 8.8.8.8 --dns 8.8.4.4"

Step 4: Restart docker service

    sudo service docker restart

Step 5: Start the skydns container to manage the docker service discovery with skydock

docker run  -d -p 172.17.0.1:53:53/udp --name skydns crosbymichael/skydns -nameserver 8.8.8.8:53 -domain docker

docker run  -d -v /var/run/docker.sock:/docker.sock --name skydock crosbymichael/skydock -ttl 30 -environment dev -s /docker.sock -domain docker -name skydns

MongoDB sharded cluster on docker

Setup the replica set:

Start the replica set 1 primary shard

docker run  --name rs1-srv1 -d mongo mongod --storageEngine wiredTiger --replSet rs1

Start the replica set 1 secondary shard

docker run  --name rs1-srv2 -d mongo mongod --storageEngine wiredTiger --replSet rs1

Start the replica set 1 arbiter

docker run  --name rs1-arb -d mongo mongod --storageEngine wiredTiger --replSet rs1

The docker container domain name register at skydns is

container-name.image-name.environment.domain-name

default value for envirnment : dev and domain name is docker

for example: domain name for primary shard is rs1-srv1.mongo.dev.docker where container name is rs1-srv1 and image name is mongo

Connect to primary shard to initiate replica set using mongodb shell client:

docker run -i -t mongo mongo --host rs1-srv1.mongo.dev.docker

Then initiate the replica set and add secondary node and arbiter to replica set.

config = { _id: "rs1", members:[{ _id : 0, host : "rs1-srv1.mongo.dev.docker:27017" }]};

rs.initiate(config);

rs.add("rs1-srv2.mongo.dev.docker:27017");
rs.addArb("rs1-arb.mongo.dev.docker:27017");
rs.status();
exit

Start the MongoDB config server:

docker run   --name cfg1 -d mongo mongod --configsvr --port 27017 --dbpath /data/db

Start the MongoDB Router :

docker run  -p 27017:27017 --name mongo-router -d mongo mongos --configdb cfg1.mongo.dev.docker:27017

Connect to MongoDB router to enable the sharding using mongodb shell client:

docker run -i -t mongo mongo --host mongo-router.mongo.dev.docker

Add the shard server to config db from mongodb router:

sh.addShard("rs1/rs1-srv1.mongo.dev.docker:27017");
sh.status();

Now MongoDB sharded cluster is ready. Need to enable sharded database on this cluster.
To give an example, here sharding is enabled on mytestdb and want to balance data distribution across the cluster use Hashed Shard Key which require hashed index.

use my_test_db
sh.enableSharding("my_test_db")

db.my_collection.ensureIndex( { _id : "hashed" } )

sh.shardCollection("my_test_db.my_collection", { "_id": "hashed" } )
exit

Start docker container on reboot using the upstart

sudo vi /etc/init/docker-mongo-cluster.conf

then paste the following content:

    description "Docker container"
    author "Vinayak Bhadage"
    start on filesystem and started docker
    stop on runlevel [!2345]
    respawn
    script

                /usr/bin/docker start skydns
                /bin/sleep 10s
                /usr/bin/docker start skydock
                /bin/sleep 10s
                /usr/bin/docker start rs1-srv1
                /bin/sleep 10s
                /usr/bin/docker start rs1-srv2
                /bin/sleep 10s
                /usr/bin/docker start rs1-arb
                /bin/sleep 10s
                /usr/bin/docker start cfg1
                /bin/sleep 10s
                /usr/bin/docker start -a mongo-router

end script

In this case need to change “rs1-srv1” name of docker container and start the service

sudo service docker-mongo-cluster

Conclusion

Dev environment is ready using MongoDB Sharded cluster. You can use MongoChefGUI client to connect the Mongo Router.

References

  1. https://github.com/crosbymichael/skydock
  2. https://hub.docker.com/_/mongo/
  3. https://medium.com/@gargar454/deploy-a-mongodb-cluster-in-steps-9-using-docker-49205e231319
  4. https://medium.com/@gargar454/deploy-a-mongodb-cluster-in-steps-9-using-docker-49205e231319

How to manage RDBMS SQL script changes with version tracking in continuous integration process

Posted on Updated on

Problem: In agile development, need to manage sql schema or script (DDL or DML) changes in continuous integration process. In this case we need to track the change log, so that at any point of time we can restore the seed database state on any environment like Dev, QA, Stage or Production.

Solution: The ORM tools like Entity Framework, Hibernate etc …  will provide native support for it. But this is restricted to your development environment only.

We need something which can be used across all the environment with any development stack like Dot.NET, Java EE or ROR etc… using   Jenkins or Chef . The Liquibase does that magic for us. In this case we can track the change log using change set either in XML, JSON , YAML, SQL and it also have Groovy based DSL.

Let’s see how to use a gradle based sample application using   liquibase-gradle-plugin to do this job. In this case git is used to track the history of Script Changes. The following diagram will show the flow of it.

The liquibase change log set as gradle project in source tracking system like github. It contains the database script in liquibase supported  format like sql,xml, json etc… The Jenkins is used to checkout github source code and execute gradle task using liquibase-gradle-plugin to execute db changes against the relational database like Mysql.

The sample project is available at liquibase-gradle-sample . The project structure with change log set is look likes below:

Here changelogs.xml contains the change set which require unique author name and id to track the version of change set. The change set is any sql file. Need to add a change set in incremental order in the changelogs.xml.

In build.gradle you need to provide the target database credentials, If you are using other than Mysql need to update the JDBC connector accordingly.


buildscript {
repositories {
jcenter()
}
dependencies {
classpath "org.liquibase:liquibase-gradle-plugin:1.1.1"
classpath 'mysql:mysql-connector-java:5.1.36'
}
}
....
....
liquibase {
activities {
main {
changeLogFile 'src/main/db/changelogs.xml'
url 'jdbc:mysql://localhost:3306/applianceDB'
username 'admin1'
password 'admin123'
}
}
}

To execute the database changes use the following command:

gradle update

Conclusion
Liquibase can be used across any ORM framework like C# Entity Framework, JPA (Hibernate), Spring Data or Active Records of Ruby On Rails to track the database changes smoothly. This can be integrated with jenkins as continuous integration or with Chef recipe as continuous delivery.

How to import data from MS SQL Server into Elasticsearch 1.6

Posted on Updated on

Problem:

Need to provide the analytics and visualization for audit log data which is stored in relational database.

Solution

One of the solution to this problem is to visualize the data in open source tool like kibana . But kibana uses the elasticsearch for search and storage purpose.

So that need to import selected records from relational database into the elasticsearch 1.6. The new index will be created in elasticsearch for this data and it will be used by kibana.

Prior to elasticsearch 1.6 the river plugin was available for this purpose but it is now deprecated.

But to solve the same problem another standalone java utility known as elasticsearch – jdbc is available.

Here I am going to tell you how to use this utility through docker so whenever you need it. it would be only three steps process for you i.e clone it, build image and start the container with parameter.

Prerequisite:

  1. Ubuntu 14.04
  2. Install Docker Host
  3. Install elasticsearch
    docker run -d -p 9200:9200 -p 9300:9300 elasticsearch 
    
  4. Install Kibana

Step 1: Check out the docker file https://github.com/vinayakbhadage/data-importer

git clone https://github.com/vinayakbhadage/data-importer.git

Step 2: Change the required parameter from this file dataimport.sh as mentioned here

Step 3: Build the images from Dockerfile

docker build -t data-importer .

Step 4: Run the data-importer by setting following parameter

1.LAST_EXECUTION_START=”2014-06-06T09:08:00.948Z”

This date time used to import the data from the log table of your database. All records in that table with timestamp column value greater than this will be imported in Elastic search.

2.INDEX_NAME=** Provide the value **

This one is index name for elasticsearch.

3.CLUSTER=** Provide the value **

Provide the elasticsearch cluster name.

4.ES_HOST=** Provide the value **

Provide the elastic search host name or IP address.

5.ES_PORT=”9300″

Provide the elastic search host port number.

6.SCHEDULE=”0 0/10 * * * ?”

Default interval for data-importer is 10 min. this is Quartz cron trigger syntax.

7.SQL_SERVER_HOST=”Provide the value”

It should be sql server database IP or hostname.

8.DB_NAME=”Provide the value”

It should be sql server database name.

9.DB_USER_NAME=”Provide the value”

It should be sql server user name, here server authentication is required.

10.DB_PASSWORD=”Provide the value”

It should be sql server user password, here server authentication is required.

Note: Please change the environment variable as per your requirement

docker run -d --name data-importer -e LAST_EXECUTION_START="2014-06-06T09:08:00.948Z" \
  -e INDEX_NAME="myindex"  -e CLUSTER="elasticsearch" -e ES_HOST="myeshost" \
  -e ES_PORT="9300" -e SCHEDULE="0 0/10 * * * ?" -e SQL_SERVER_HOST="mydb" \
  -e DB_NAME="mydb" -e DB_USER_NAME="myuser" -e DB_PASSWORD="find-out" data-importer

Lastly checkout the status of elasticsearch index then you can find data over there.