AWS Redshift – Best Practices for Loading Data

Best Practices for Loading Data

• Use a COPY Command to load data

• Use a single COPY command per table

• Split your data into multiple files

• Compress your data files with GZIP or LZOP

• Use multi-row inserts whenever possible

• Bulk insert operations (INSERT INTO…SELECT and CREATE TABLE AS) provide high performance data insertion

• Use Amazon Kinesis Firehose for Streaming Data direct load to S3 and/or Redshift

• Load your data in sort key order to avoid needing to vacuum

• Organize your data as a sequence of time-series tables, where each table is identical but contains data for different time ranges

• Use staging tables to perform an upsert

• Run the VACUUM command whenever you add, delete, or modify a large number of rows, unless you load your data in sort key order

• Increase the memory available to a COPY or VACUUM by increasing wlm_query_slot_count

• Run the ANALYZE command whenever you’ve made a non-trivial number of changes to your data to ensure your table statistics are current

0 views0 comments

Recent Posts

See All

What are the different AWS Support Plans Available?

AWS Support offers four support plans: Basic, Developer, Business, and Enterprise. The Basic plan is free of charge and offers support for account and billing questions and service limit increases. ..