AWS Redshift – Best Practices for Loading Data

Best Practices for Loading Data


• Use a COPY Command to load data

• Use a single COPY command per table

• Split your data into multiple files

• Compress your data files with GZIP or LZOP

• Use multi-row inserts whenever possible

• Bulk insert operations (INSERT INTO…SELECT and CREATE TABLE AS) provide high performance data insertion

• Use Amazon Kinesis Firehose for Streaming Data direct load to S3 and/or Redshift

• Load your data in sort key order to avoid needing to vacuum

• Organize your data as a sequence of time-series tables, where each table is identical but contains data for different time ranges

• Use staging tables to perform an upsert

• Run the VACUUM command whenever you add, delete, or modify a large number of rows, unless you load your data in sort key order

• Increase the memory available to a COPY or VACUUM by increasing wlm_query_slot_count

• Run the ANALYZE command whenever you’ve made a non-trivial number of changes to your data to ensure your table statistics are current


0 views0 comments

Recent Posts

See All

How to Install & Configure AWSCLI on macOS ?

Users using macOS can follow the below command and steps to install the latest version of AWS CLI.The user should have sudo access to execute the below command. ​​For the latest version of the AWS CLI