Migrate Dynamodb Using Totalcloud

There are a few ways to migrate DynamoDB on the web if you go looking for it. One of the best sources is the AWS guide.

This guide can be used to

a. Migrate DynamoDB from one region to another.

b. Import external data into DynamoDB from sources like S3

c. Export DynamoDB to S3 as well

All these use cases, including migrating DynamoDB are mainly driven by two actions:

  • Exporting DynamoDB data to AWS S3 using Data Pipeline.
  • Importing data from AWS S3 to DynamoDB again using the Data Pipeline.

Amazon Documents provide a detailed description of how to leverage AWS Data Pipeline to do the above task. We decided to put TotalCloud to a test and recreate the Data Pipeline, activate it and perform any of the above actions used to migrate DynamoDB.

Here are the templates that you can view to understand what TotalCloud does to perform the same task you would have otherwise done manually.

First, we tried creating a Data pipeline using the interface. Here’s how we did it.

As usual, it starts with a trigger node. Urgh! Scratch that. We created a template for you to get started.

Before we go there, there are some pre-requisites to be taken care of.

Pre-requisites

Roles

Two IAM roles are required to make this work.

Role Name 1: DataPipelineDefaultRole

Policy: AWSDataPipelineRole

Role Name 2: DataPipelineDefaultResourceRole

Policy: AmazonEC2RoleforDataPipelineRole

Amazon has provided the required steps.

S3 Buckets

Create two buckets. One for the data to be exported/imported and one for the Data Pipeline logs. We’ll be using these buckets to work with the data pipeline.

Once you are set up, let’s create the workflow.

Select a template

When you create a new workflow, you will see an option to “Select a workflow template” or “Create workflow from scratch”.

Select the template option and click on “Next”. You should view a number of templates available in a list. Go ahead and just select “AWS DynamoDB to S3 exporter”.

This should give you a preset that you can use to quickly configure your Data Pipeline.

You can just double click any node to configure it. You can start from:

Trigger

Double click the trigger node to see the configurations for it and edit it as needed.

Action 1

Select the AWS Account that you wish to use with the region in which you want to create the pipeline.

If you haven’t synced your AWS account, you can use the “Sync AWS Account” option and follow these instructions to do so.

Once done, click on “Save Node”.

Action 2

Again, select the AWS account and region. Then just click on “Additional Parameters” to view the changes that are needed for this node. You can view the nodes a

{

/*---------- required params ----------*/

   "pipelineId": "VALUE",


/*---------- optional params ----------*/

/*  
*   (Use keyword MAP in place of value if want to
*   autofill any value from previous  data)
*/  


   "pipelineObjects": [
       {
           "id": "Default",
           "name": "Default", /* name can be autogenerated with timestamp */
           "fields": [
               {
                   "key": "failureAndRerunMode",
                   "stringValue": "CASCADE"
               },
               {
                   "key": "resourceRole",
                   "stringValue": "DataPipelineDefaultResourceRole"
               },
               {
                   "key": "role",
                   "stringValue": "DataPipelineDefaultRole"
               },
               {
                   "key": "pipelineLogUri",
                   "stringValue": ""     //S3 bucket for logs
               },
               {
                   "key": "scheduleType",
                   "stringValue": "ONDEMAND"
               }
           ]
       },
       {
           "id": "EmrClusterForBackup",
           "name": "EmrClusterForBackup",
           "fields": [
               {
                   "key": "role",
                   "stringValue": "DataPipelineDefaultRole"
               },
               {
                   "key": "coreInstanceCount",
                   "stringValue": "1"
               },
               {
                   "key": "coreInstanceType",
                   "stringValue": "m3.xlarge"
               },
               {
                   "key": "releaseLabel",
                   "stringValue": "emr-5.13.0"
               },
               {
                   "key": "masterInstanceType",
                   "stringValue": "m3.xlarge"
               },
               {
                   "key": "region",
                   "stringValue": ""    //pipeline region
               },
               {
                   "key": "type",
                   "stringValue": "EmrCluster"
               },
               {
                   "key": "terminateAfter",
                   "stringValue": "15 Minutes" // can be changed based on the size of the table. It takes 7 minutes to setup the EMR account that information as well while setting this time.
               }
           ]
       },
       {
           "id": "TableBackupActivity",
           "name": "TableBackupActivity",
           "fields": [
               {
                   "key": "output",
                   "refValue": "S3BackupLocation"
               },
               {
                   "key": "input",
                   "refValue": "DDBSourceTable"
               },
               {
                   "key": "maximumRetries",
                   "stringValue": "2"
               },
               {
                   "key": "step",
                   "stringValue": "s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}"
               },
               {
                   "key": "runsOn",
                   "refValue": "EmrClusterForBackup"
               },
               {
                   "key": "type",
                   "stringValue": "EmrActivity"
               },
               {
                   "key": "resizeClusterBeforeRunning",
                   "stringValue": "true"
               }
           ]
       },
       {
           "id": "S3BackupLocation",
           "name": "S3BackupLocation",
           "fields": [
               {
                   "key": "directoryPath",
                   "stringValue": "/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"                       //S3 bucket for backup(before the /)
               },
               {
                   "key": "type",
                   "stringValue": "S3DataNode"
               }
           ]
       },
       {
           "id": "DDBSourceTable",
           "name": "DDBSourceTable",
           "fields": [
               {
                   "key": "readThroughputPercent",
                   "stringValue": "0.25"
               },
               {
                   "key": "type",
                   "stringValue": "DynamoDBDataNode"
               },
               {
                   "key": "tableName",
                   "stringValue": ""      //dynamoDB table name
               }
           ]
       }
   ],
   "parameterObjects": [
       {
           "id": "myDDBTableName",
           "attributes": [
               {
                   "key": "description",
                   "stringValue": "Source DynamoDB table name"
               },
               {
                   "key": "type",
                   "stringValue": "String"
               }
           ]
       },
       {
           "id": "myOutputS3Loc",
           "attributes": [
               {
                   "key": "description",
                   "stringValue": "Output S3 folder"
               },
               {
                   "key": "type",
                   "stringValue": "AWS::S3::ObjectKey"
               }
           ]
       },
       {
           "id": "myDDBReadThroughputRatio",
           "attributes": [
               {
                   "key": "default",
                   "stringValue": "0.25"
               },
               {
                   "key": "watermark",
                   "stringValue": "Enter value between 0.1-1.0"
               },
               {
                   "key": "description",
                   "stringValue": "DynamoDB read throughput ratio"
               },
               {
                   "key": "type",
                   "stringValue": "Double"
               }
           ]
       },
       {
           "id": "myDDBRegion",
           "attributes": [
               {
                   "key": "default",
                   "stringValue": ""  //pipeline region
               },
               {
                   "key": "watermark",
                   "stringValue": ""   //pipeline region
               },
               {
                   "key": "description",
                   "stringValue": "Region of the DynamoDB table"
               },
               {
                   "key": "type",
                   "stringValue": "String"
               }
           ]
       }
   ],
   "parameterValues": [
       {
           "id": "myDDBRegion",
           "stringValue": ""                                                           //pipeline region
       },
       {
           "id": "myDDBTableName",
           "stringValue": ""                                                          //dynamoDB table name
       },
       {
           "id": "myDDBReadThroughputRatio",
           "stringValue": "0.25"
       },
       {
           "id": "myOutputS3Loc",
           "stringValue": ""                                                  //S3 bucket for backup
       }
   ]
}

Add appropriate values as mentioned in the pre-requisites.

Click on “Apply Query” and then “Save Node”.

Notification

Just add the email or slack channel you want to get notified on once the workflow is complete.

Run the workflow

“Save Node” and click on “Run Now”. This should show you the result about the workflow, how it ran and what it did.

Next, Save the workflow to check if you have the permissions or not. Then, enable the workflow using the toggle switch.

Just click on “Run Now” to run the workflow and create the pipeline.

Then deactivate the workflow so that it doesn’t run again.

Define the data pipeline

Click on “Pick the template” from the editor menu.

Select “Activate DynamoDB export pipeline”.

Pick “Activate DynamoDB export pipeline” from the list and you will see the following template setup for you.

Trigger

You know the drill. Set it up as done earlier.

Resource

Select the same AWS account and region as done earlier.

Filter

This node filters out the data pipeline we created earlier. Leave it as it is.

Action

Select the same AWS account and region as done earlier.

Notification

Set up an email or slack notification as done earlier.

Just add the email or slack channel you want to get notified on once the workflow is complete.

Run the workflow

“Save Node” and click on “Run Now”. This should show you the result about the workflow, how it ran and what it did.

Next, Save the workflow to check if you have the permissions or not. Then, enable the workflow using the toggle switch.

Just click on “Run Now” to run the workflow and define the pipeline we created earlier.

Then deactivate the workflow so that it doesn’t run again.

Now go to the AWS console to see the progress. It takes around 7 minutes to get the EMR setup and then the transfer begins. Here’s some more information on the same provided by AWS.

AWS Console> Data Pipeline

Use cases for Import and Export DynamoDB

There are templates available for the following use cases as mentioned below.

Export DynamoDB to AWS S3

  1. Create, configure and save the Data Pipeline(AWS DynamoDB to S3 exporter).
  2. Activate the Pipeline(Activate DynamoDB export pipeline).

Import data from AWS S3 to DynamoDB

  1. Create, configure and save the Data Pipeline(S3 to DynamoDB importer).
  2. Activate the Pipeline(Activate DynamoDB Importer Pipeline).

TotalCloud is workflow-based cloud management for AWS. We are modularizing cloud management to make it accessible to everyone.

Migrate Dynamodb Using Totalcloud

Smart Scheduling at your fingertips

Go from simple to smart, real-time AWS resource scheduling to save cost and increase team productivity.

Learn More
More Posts

You Might Also Like

AWS Use Case Files
Automating Auto Scaling Group Updates
AWS introduced autoscaling to make EC2 cluster scaling easier. We've seen users constantly make changes to their EC2s and put new policies into play. It’s important to update your Autoscaling group with the new instances so that it doesn’t ignore these machines.
July 6, 2020
Hrishikesh
AWS Use Case Files
Launch EC2 Instances with CloudFormation
CloudFormation is the gateway to Infrastructure-as-code for AWS users. Learn how you can deploy Cloudformation templates through Totalcloud workflows and increase your customization.
June 25, 2020
Hrishikesh
AWS Use Case Files
JIRA Triggered Cloud Management
What if cloud management were as easy as raising a JIRA ticket? Almost every DevOps team uses JIRA as a standard means of issue tracking & task management. It’s a given that it would be a seamless process if you could also integrate your cloud processes with it.
June 16, 2020
Hrishikesh
AWS Use Case Files
Totalcloud Launches New Temporary Rightsizing Feature
You can't always shut down your EC2 machine outside of business hours since some machines are needed up for longer periods. Totalcloud's new downgrade feature lets you optimize your costs by letting you downgrade your machines in a fixed schedule.
June 8, 2020
Hrishikesh
AWS Use Case Files
S3 Cost Saving: Archiving Compressed S3 Data into Glacier
We've devised a new workflow to cut your archiving costs. Simplify the storage, compression, and transfer of your S3 data into S3 Glacier with 1 workflow and 8 nodes.
June 8, 2020
Hrishikesh
AWS Use Case Files
Creating a 3-tier Application With Totalcloud’s Code-Free Workflows
As part of a new request by a customer, we've developed a workflow to deploy 3-tier applications much faster. Utilising merely 3 workflows to achieve a result that would have you scripting and troubleshooting for hours. This post gives you an idea of how this workflow functions, the services being used, and how you can benefit from it.
June 2, 2020
Hrishikesh