Redshift Table Distribution Styles

Updated: Aug 16, 2019

Choosing a Distribution Style

• Choose a distribution style of KEY for

– Large data tables, like a FACT table in a star schema

– Large or rapidly changing tables used in joins or aggregations

– Improved performance even if the key is not used in join column

• Choose a distribution style of ALL for tables that

– Have slowly changing data – Reasonable size (i.e., few millions but not 100’s of millions of rows)

– No common distribution key for frequent joins

– Typical use case – joined dimension table without a common distribution key

• Choose a distribution style of EVEN for tables that are not joined and have no aggregate queries

Choosing a Good Distribution Key

• Goal

– Distribute data evenly across nodes

– Minimize data movement: Co-located Joins & Aggregates

• Best Practice

– Use the joined columns for largest commonly joined tables as key (example: fact table and large dimension table) – Consider using Group By column as a key (GROUP BY clause)

– Never use a distribution key that causes severe data skew – Choose a key with high cardinality; large number of discrete values

• Avoid

– Keys used as equality filter as your distribution key (Concentrates processing on one node)

©2019 by Raghavendra Kambhampati