AWS Redshift – Best Practices for Loading Data

Updated: Aug 16, 2019

Best Practices for Loading Data

• Use a COPY Command to load data

• Use a single COPY command per table

• Split your data into multiple files

• Compress your data files with GZIP or LZOP

• Use multi-row inserts whenever possible

• Bulk insert operations (INSERT INTO…SELECT and CREATE TABLE AS) provide high performance data insertion

• Use Amazon Kinesis Firehose for Streaming Data direct load to S3 and/or Redshift

• Load your data in sort key order to avoid needing to vacuum

• Organize your data as a sequence of time-series tables, where each table is identical but contains data for different time ranges

• Use staging tables to perform an upsert

• Run the VACUUM command whenever you add, delete, or modify a large number of rows, unless you load your data in sort key order

• Increase the memory available to a COPY or VACUUM by increasing wlm_query_slot_count

• Run the ANALYZE command whenever you’ve made a non-trivial number of changes to your data to ensure your table statistics are current

©2019 by Raghavendra Kambhampati