7 Best Practices for Amazon Redshift

What is Redshift? and the benefits of optimized usage

Amazon Redshift is a cloud data warehouse service that allows for fast and cost-effective analysis of petabytes worth of data stored across the data warehouse. Using the combination of machine learning, massively parallel processing, and columnar storage on SSD disks, Redshift can deliver more than 10x the performance rate compared to other data warehouses.

However, this high-performance rate can end up costing unnecessarily or you could have trouble making the most out of your cluster performance in the form of inconsistent query performance or issues with workload scaling. These Amazon Redshift Best Practices aim to improve your planning, monitoring, and configuring to make the most out of your data.

Improve Query performance with Custom Workload Manager queue

You can use the Workload Manager to manage query performance. Redshift runs queries in a queuing model. You will likely have to configure the default WLM setting which offers one queue with five slots.

You can have two benefits with configuring the WLM

  • Scaling workloads by giving them enough resources 
  • Isolating and protecting your predictable workloads from your unpredictable workloads 

You can have up to 8 queues with a total of up to 50 slots. A query will run in a single slot, by default. Queries can be routed into queues using certain rules. Setting up your WLM the right way will eliminate queue wait times and disk-based queries.

  • Match the queue slot count to the limit of concurrent queries so you can reduce or avoid wait times.
  • Assign enough memory to your queues to reduce disk-based queries  

Using distribution keys to reduces transfer time

When creating a table, you have the power to decide how the data is spread to the slices, evenly or specifically according to the columns. By choosing joined columns for distribution, you can minimize the data transfer over the network.

Selecting a good distribution key helps with the data transfer over to the slices and how well the slices perform during query execution. These are the primary attributes to pay attention to when choosing a good distribution key.

  • High cardinality: As the number of slices in the cluster increases, a relative number of unique data values must exist.
  • Low skew: The number of times each unique value occurs in the distribution key column of the table should be ideally equal.
  • Commonly joined: Choose a distribution key that joins to the other tables, if you have a few of these then choose the one that joins to the most number of rows,

A skewed distribution key results in some slices working harder than others during query execution, unbalancing CPU or memory, and ultimately only running as fast as the slowest slice.

Boost Query throughput with Redshift Spectrum

Redshift Spectrum improves performance and query throughput by enabling you to query data directly from files on Amazon S3 through a compute layer that is both independent and elastically sized. 

  • Amazon Redshift Spectrum’s compute layer allows more processing power to execute specific queries by dispersing the workload from the main cluster.
  • Skip the loading in the ELT process and instead directly run the transform on the S3 data.
  • Avoid loading the data onto the main cluster by running queries right when data arrives at Amazon S3.

Storage Optimization using Analyze and Vacuum

The Amazon Redshift Advisor automatically analyzes the current workload management (WLM) usage and makes recommendations for better performance and throughput. It also lets you know unused tables by tracking your activity. You can take advantage of this automatic analysis provided by the advisor to optimize your tables. ANALYZE is a specific command that provides queries that help address any missing or statistic for the table. These statistics are imperative for the advisor to help come up with optimal recommendations. 

When rows are DELETED or UPDATED, the computer flags them for deletion but doesn’t actually remove them from the disk yet. Updates result in writing a new block with new data appended. So the system still scans the previous rows and continues allocating disk space for them. This increases the overall storage space of the table and thereby affects the performance. You can use the VACUUM command manually to circumvent this issue by deleting the rows and restoring the sort order.

Auto Vacuum eliminates the need for you to run the VACUUM DELETE command manually. However, the automated process gets the best results when you have sorted your data via the sort key columns. 

Compress size with Copy command

The COPY command is the ideal method used to transfer data into Amazon Redshift. Copy uses parallel architecture for faster data transfer and it allows multiple sources to be transferred from such as EMR, DunamoDB, and SSH remote hosts.

You get:

  • Files uploaded faster to Amazon S3
  • Reduction in consumption of S3 Storage
  • The loading process of files becomes faster as compression and decompression happen as files are being read.

Improving export performance with the UNLOAD command

UNLOAD lets you export SQL statements made in Redshift to S3 faster through parallel processing. The data is exported in parquet format which can also be done at faster processing speeds than text formats. The data can be compressed before being exported to S3. The larger the queries being passed through, the easier it gets. 

Using CloudWatch metrics/Totalcloud custom templates to improve performance

CloudWatch Metrics lets you set up monitoring events that can target clusters, perform health checks, automate read/write process, etc. You can achieve similar monitoring features by creating a workflow in Totalcloud’s editor. You could create start and stop workflows for Redshift to save bills or manipulate queries within tables or automate data transfer. 

A combination of automated processes with manual checkups is the healthiest maintenance strategy for your clusters.

The above steps can make querying with Amazon Redshift faster, easier and more storage efficient. Efficiently handling the tables and the data within it is the best way to make manipulating clusters worthwhile.

  7 Best Practices for Amazon Redshift

Smart Scheduling at your fingertips

Go from simple to smart, real-time AWS resource scheduling to save cost and increase team productivity.

Learn More
More Posts

You Might Also Like

AWS Tips & Tricks
Componentized Cloud Management: The way ahead for Cloud Automation
When something gets complex, our primary approach is to break it down — even cloud management. If you’re a part of a growing company that uses the cloud, you can see your infrastructure becoming more…
May 29, 2020
AWS Tips & Tricks
Cost Optimization with AWS Serverless Resource Scheduling
You must be aware of EC2 scheduling and its benefits on cost optimization. However, scheduling doesn't need to stop at just EC2 or RDS. There are plenty of other AWS serverless resources that can be scheduled to save costs. While the traditional way might be tedious, Totalcloud provides a safe alternative.
May 28, 2020
AWS Use Case Files
Deploying an EKS Cluster With TotalCloud's Code-Free Workflows
Totalcoud workflows can be used for many creative applications. One such application was developed as part of a customer request. With simply 2 workflows, we removed the hassle of provisioning your EKS clusters on AWS. What normally takes grueling efforts of scripting or configuring have now been reduced to just a few clicks.
May 28, 2020
AWS Tips & Tricks
5 Best Practices for Tagging AWS Resources
Tagging AWS resources is a simple concept that can come with a bunch of different benefits when used appropriately. Here are the 5 best practices on how you can make the most out of your AWS EC2 tags. Also, learn the common mistakes you could make and how to avoid it.
May 12, 2020
AWS Tips & Tricks
Helpful Tips for EC2 Rightsizing
Optimize your cloud costs and boost performance with these tips for rightsizing. Here we go through all the different methods for rightsizing and the approach you need to follow to make sure you are constantly aware of the changing demands in your environment.
May 6, 2020
Cost Analyzer
Instance Comparison Chart
Use this new Instance Comparison chart provided by Totalcloud to find the right AWS EC2 Instance that matches your requirements and budget. With several options of sorting and filtering, you can narrow down your optimal machines easily.
April 30, 2020