AWS Redshift – Best Practices for Loading Data

Best Practices for Loading Data

• Use a COPY Command to load data

• Use a single COPY command per table

• Split your data into multiple files

• Compress your data files with GZIP or LZOP

• Use multi-row inserts whenever possible

• Bulk insert operations (INSERT INTO…SELECT and CREATE TABLE AS) provide high performance data insertion

• Use Amazon Kinesis Firehose for Streaming Data direct load to S3 and/or Redshift

• Load your data in sort key order to avoid needing to vacuum

• Organize your data as a sequence of time-series tables, where each table is identical but contains data for different time ranges

• Use staging tables to perform an upsert

• Run the VACUUM command whenever you add, delete, or modify a large number of rows, unless you load your data in sort key order

• Increase the memory available to a COPY or VACUUM by increasing wlm_query_slot_count

• Run the ANALYZE command whenever you’ve made a non-trivial number of changes to your data to ensure your table statistics are current

1 view0 comments

Recent Posts

See All

Users using macOS can follow the below command and steps to install the latest version of AWS CLI.The user should have sudo access to execute the below command. ​​For the latest version of the AWS CLI