Redshift Table Distribution Styles

Choosing a Distribution Style

• Choose a distribution style of KEY for

– Large data tables, like a FACT table in a star schema

– Large or rapidly changing tables used in joins or aggregations

– Improved performance even if the key is not used in join column

• Choose a distribution style of ALL for tables that

– Have slowly changing data – Reasonable size (i.e., few millions but not 100’s of millions of rows)

– No common distribution key for frequent joins

– Typical use case – joined dimension table without a common distribution key

• Choose a distribution style of EVEN for tables that are not joined and have no aggregate queries

Choosing a Good Distribution Key

• Goal

– Distribute data evenly across nodes

– Minimize data movement: Co-located Joins & Aggregates

• Best Practice

– Use the joined columns for largest commonly joined tables as key (example: fact table and large dimension table) – Consider using Group By column as a key (GROUP BY clause)

– Never use a distribution key that causes severe data skew – Choose a key with high cardinality; large number of discrete values

• Avoid

– Keys used as equality filter as your distribution key (Concentrates processing on one node)

1 view0 comments

Recent Posts

See All

Users using macOS can follow the below command and steps to install the latest version of AWS CLI.The user should have sudo access to execute the below command. ​​For the latest version of the AWS CLI