Migrate Dynamodb Using Totalcloud

There are a few ways to migrate DynamoDB on the web if you go looking for it. One of the best sources is the AWS guide.

This guide can be used to

a. Migrate DynamoDB from one region to another.

b. Import external data into DynamoDB from sources like S3

c. Export DynamoDB to S3 as well

All these use cases, including migrating DynamoDB are mainly driven by two actions:

  • Exporting DynamoDB data to AWS S3 using Data Pipeline.
  • Importing data from AWS S3 to DynamoDB again using the Data Pipeline.

Amazon Documents provide a detailed description of how to leverage AWS Data Pipeline to do the above task. We decided to put TotalCloud to a test and recreate the Data Pipeline, activate it and perform any of the above actions used to migrate DynamoDB.

Here are the templates that you can view to understand what TotalCloud does to perform the same task you would have otherwise done manually.

First, we tried creating a Data pipeline using the interface. Here’s how we did it.

As usual, it starts with a trigger node. Urgh! Scratch that. We created a template for you to get started.

Before we go there, there are some pre-requisites to be taken care of.

Pre-requisites

Roles

Two IAM roles are required to make this work.

Role Name 1: DataPipelineDefaultRole

Policy: AWSDataPipelineRole

Role Name 2: DataPipelineDefaultResourceRole

Policy: AmazonEC2RoleforDataPipelineRole

Amazon has provided the required steps.

S3 Buckets

Create two buckets. One for the data to be exported/imported and one for the Data Pipeline logs. We’ll be using these buckets to work with the data pipeline.

Once you are set up, let’s create the workflow.

Select a template

When you create a new workflow, you will see an option to “Select a workflow template” or “Create workflow from scratch”.

Select the template option and click on “Next”. You should view a number of templates available in a list. Go ahead and just select “AWS DynamoDB to S3 exporter”.

This should give you a preset that you can use to quickly configure your Data Pipeline.

You can just double click any node to configure it. You can start from:

Trigger

Double click the trigger node to see the configurations for it and edit it as needed.

Action 1

Select the AWS Account that you wish to use with the region in which you want to create the pipeline.

If you haven’t synced your AWS account, you can use the “Sync AWS Account” option and follow these instructions to do so.

Once done, click on “Save Node”.

Action 2

Again, select the AWS account and region. Then just click on “Additional Parameters” to view the changes that are needed for this node. You can view the nodes a

{

/*---------- required params ----------*/

   "pipelineId": "VALUE",


/*---------- optional params ----------*/

/*  
*   (Use keyword MAP in place of value if want to
*   autofill any value from previous  data)
*/  


   "pipelineObjects": [
       {
           "id": "Default",
           "name": "Default", /* name can be autogenerated with timestamp */
           "fields": [
               {
                   "key": "failureAndRerunMode",
                   "stringValue": "CASCADE"
               },
               {
                   "key": "resourceRole",
                   "stringValue": "DataPipelineDefaultResourceRole"
               },
               {
                   "key": "role",
                   "stringValue": "DataPipelineDefaultRole"
               },
               {
                   "key": "pipelineLogUri",
                   "stringValue": ""     //S3 bucket for logs
               },
               {
                   "key": "scheduleType",
                   "stringValue": "ONDEMAND"
               }
           ]
       },
       {
           "id": "EmrClusterForBackup",
           "name": "EmrClusterForBackup",
           "fields": [
               {
                   "key": "role",
                   "stringValue": "DataPipelineDefaultRole"
               },
               {
                   "key": "coreInstanceCount",
                   "stringValue": "1"
               },
               {
                   "key": "coreInstanceType",
                   "stringValue": "m3.xlarge"
               },
               {
                   "key": "releaseLabel",
                   "stringValue": "emr-5.13.0"
               },
               {
                   "key": "masterInstanceType",
                   "stringValue": "m3.xlarge"
               },
               {
                   "key": "region",
                   "stringValue": ""    //pipeline region
               },
               {
                   "key": "type",
                   "stringValue": "EmrCluster"
               },
               {
                   "key": "terminateAfter",
                   "stringValue": "15 Minutes" // can be changed based on the size of the table. It takes 7 minutes to setup the EMR account that information as well while setting this time.
               }
           ]
       },
       {
           "id": "TableBackupActivity",
           "name": "TableBackupActivity",
           "fields": [
               {
                   "key": "output",
                   "refValue": "S3BackupLocation"
               },
               {
                   "key": "input",
                   "refValue": "DDBSourceTable"
               },
               {
                   "key": "maximumRetries",
                   "stringValue": "2"
               },
               {
                   "key": "step",
                   "stringValue": "s3://dynamodb-emr-#{myDDBRegion}/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,org.apache.hadoop.dynamodb.tools.DynamoDbExport,#{output.directoryPath},#{input.tableName},#{input.readThroughputPercent}"
               },
               {
                   "key": "runsOn",
                   "refValue": "EmrClusterForBackup"
               },
               {
                   "key": "type",
                   "stringValue": "EmrActivity"
               },
               {
                   "key": "resizeClusterBeforeRunning",
                   "stringValue": "true"
               }
           ]
       },
       {
           "id": "S3BackupLocation",
           "name": "S3BackupLocation",
           "fields": [
               {
                   "key": "directoryPath",
                   "stringValue": "/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}"                       //S3 bucket for backup(before the /)
               },
               {
                   "key": "type",
                   "stringValue": "S3DataNode"
               }
           ]
       },
       {
           "id": "DDBSourceTable",
           "name": "DDBSourceTable",
           "fields": [
               {
                   "key": "readThroughputPercent",
                   "stringValue": "0.25"
               },
               {
                   "key": "type",
                   "stringValue": "DynamoDBDataNode"
               },
               {
                   "key": "tableName",
                   "stringValue": ""      //dynamoDB table name
               }
           ]
       }
   ],
   "parameterObjects": [
       {
           "id": "myDDBTableName",
           "attributes": [
               {
                   "key": "description",
                   "stringValue": "Source DynamoDB table name"
               },
               {
                   "key": "type",
                   "stringValue": "String"
               }
           ]
       },
       {
           "id": "myOutputS3Loc",
           "attributes": [
               {
                   "key": "description",
                   "stringValue": "Output S3 folder"
               },
               {
                   "key": "type",
                   "stringValue": "AWS::S3::ObjectKey"
               }
           ]
       },
       {
           "id": "myDDBReadThroughputRatio",
           "attributes": [
               {
                   "key": "default",
                   "stringValue": "0.25"
               },
               {
                   "key": "watermark",
                   "stringValue": "Enter value between 0.1-1.0"
               },
               {
                   "key": "description",
                   "stringValue": "DynamoDB read throughput ratio"
               },
               {
                   "key": "type",
                   "stringValue": "Double"
               }
           ]
       },
       {
           "id": "myDDBRegion",
           "attributes": [
               {
                   "key": "default",
                   "stringValue": ""  //pipeline region
               },
               {
                   "key": "watermark",
                   "stringValue": ""   //pipeline region
               },
               {
                   "key": "description",
                   "stringValue": "Region of the DynamoDB table"
               },
               {
                   "key": "type",
                   "stringValue": "String"
               }
           ]
       }
   ],
   "parameterValues": [
       {
           "id": "myDDBRegion",
           "stringValue": ""                                                           //pipeline region
       },
       {
           "id": "myDDBTableName",
           "stringValue": ""                                                          //dynamoDB table name
       },
       {
           "id": "myDDBReadThroughputRatio",
           "stringValue": "0.25"
       },
       {
           "id": "myOutputS3Loc",
           "stringValue": ""                                                  //S3 bucket for backup
       }
   ]
}

Add appropriate values as mentioned in the pre-requisites.

Click on “Apply Query” and then “Save Node”.

Notification

Just add the email or slack channel you want to get notified on once the workflow is complete.

Run the workflow

“Save Node” and click on “Run Now”. This should show you the result about the workflow, how it ran and what it did.

Next, Save the workflow to check if you have the permissions or not. Then, enable the workflow using the toggle switch.

Just click on “Run Now” to run the workflow and create the pipeline.

Then deactivate the workflow so that it doesn’t run again.

Define the data pipeline

Click on “Pick the template” from the editor menu.

Select “Activate DynamoDB export pipeline”.

Pick “Activate DynamoDB export pipeline” from the list and you will see the following template setup for you.

Trigger

You know the drill. Set it up as done earlier.

Resource

Select the same AWS account and region as done earlier.

Filter

This node filters out the data pipeline we created earlier. Leave it as it is.

Action

Select the same AWS account and region as done earlier.

Notification

Set up an email or slack notification as done earlier.

Just add the email or slack channel you want to get notified on once the workflow is complete.

Run the workflow

“Save Node” and click on “Run Now”. This should show you the result about the workflow, how it ran and what it did.

Next, Save the workflow to check if you have the permissions or not. Then, enable the workflow using the toggle switch.

Just click on “Run Now” to run the workflow and define the pipeline we created earlier.

Then deactivate the workflow so that it doesn’t run again.

Now go to the AWS console to see the progress. It takes around 7 minutes to get the EMR setup and then the transfer begins. Here’s some more information on the same provided by AWS.

AWS Console> Data Pipeline

Use cases for Import and Export DynamoDB

There are templates available for the following use cases as mentioned below.

Export DynamoDB to AWS S3

  1. Create, configure and save the Data Pipeline(AWS DynamoDB to S3 exporter).
  2. Activate the Pipeline(Activate DynamoDB export pipeline).

Import data from AWS S3 to DynamoDB

  1. Create, configure and save the Data Pipeline(S3 to DynamoDB importer).
  2. Activate the Pipeline(Activate DynamoDB Importer Pipeline).

TotalCloud is workflow-based cloud management for AWS. We are modularizing cloud management to make it accessible to everyone.

Migrate Dynamodb Using Totalcloud

Smart Scheduling at your fingertips

Go from simple to smart, real-time AWS resource scheduling to save cost and increase team productivity.

Learn More
More Posts

You Might Also Like

Cloud Computing
20 Cloud Influencers You Should Be Following in 2020
It’s important to follow the right individuals so that you remain on the loop and always find yourself learning things that you were unaware of. These thought leaders and influencers can only be the avenues by which you meet other interesting technologists.
September 23, 2020
Hrishikesh
Cloud Computing
Everything You Need To Know About Kubernetes Scheduler
When creating a Kubernetes cluster, scheduling the pod to an available node is an important component of the process. This component works under specific rules and technicalities that I’d like to explore in this article...
September 23, 2020
Hrishikesh
Cloud Automation
New In: No-code cloud management workflows for Azure, VMware & Private Cloud (in addition to AWS)
At TotalCloud, we’ve been enabling workflow-based cloud management for AWS to make it intuitive, accelerated, and no-code. Instead of programming cloud management use cases or depending on siloed solutions, we built out a platform that gives you building blocks to assemble any cloud management solution. 
September 4, 2020
Sayonee
Cloud Computing
List of Essential Kubernetes Tools
Kubernetes is a Container-as-a-Service with tons of unique tools to choose from. External tools play a role in integrating with different systems or maintaining control over the clusters you deploy. Manual health checks and troubleshooting is not ideal to keep a system in full health.This list of tools will provide ample support to your containers and have enough configuration to leave management flexible...
August 12, 2020
Hrishikesh
AWS Use Case Files
TotalCloud Inventory Actions: Giving a new meaning to Cloud Inventory
Learn how the TotalCloud Inventory Dashboard can become equivalent to your cloud provider’s SDK. Carry out any action on any discovered resource with Inventory Actions.
July 30, 2020
Sayonee
AWS Tips & Tricks
AWS Tutorial: Create an AWS Instance Scheduler with Terraform
Terraform is a popular IaaS tool used by many to create, update, and maintain their AWS architecture. If you use Terraform to provision your AWS architecture, you won’t be disappointed with our new AWS tutorial video.We provide you with the means to set up your own instance scheduler from Terraform...
July 20, 2020
Hrishikesh