AWS Redshift – Sorting Data

Sorting Data

The sort key helps Redshift minimize I/O – For example, a table sorted on timestamp and queried on date range will skip all blocks not in the query range • In the slices (on disk), the data is sorted by a sort key – If no sort key exists Redshift uses the data insertion order • Choose a sort key that is frequently used in your queries – Primarily as a query predicate (date, identifier, …) – Optionally choose a column frequently used for aggregates – Optionally choose same as distribution key column for most efficient joins (merge join) • Don’t use too many columns per table as sort keys

Sort in multiple flavors!

Compound Sort:

– Nested collection of columns

– Data is sorted by the first, and the next column is used as a ‘tie breaker’ among columns that share values in the leftmost columns. As a general rule, try to use no more than six columns

– Useful when a sort column is in a query predicate and to make group by and order by more efficient •

Interleaved Sort:

– A recently introduced feature that allows for up to 8 columns, treated as equally important rather than used as a ‘tie breaker’ as they are in compound sorts

– Good for Ad Hoc queries: when you do not know which combination of columns you will use to query your data.

– Shines with larger workloads (100m rows +)

Example –Compound Sort vs Interleaved Sort

• Product (type, color, size) • Compound sort on type, color, size

– Where type= ‘shirt’ and color=‘red’ and size= ‘xl’: full advantage of sort

– Where type= ‘shirt’ and color=‘red’: some efficiency

– Where type= ‘shirt’ and size= ‘xl’:

• Interleaved sort on type, color, size

– Using any of type, color and size in a where clause will improve performance

Example –Distribution and Sort Keys

SELECT SUM( S.Price * S.Quantity ) FROM SALES S JOIN CATEGORY C ON C.ProductId = S.ProductId JOIN FRANCHISE F ON F.FranchiseId = S.FranchiseId Where C.CategoryId = ‘Produce’ And F.State = ‘WA’ AND S.Date Between ‘1/1/2015’ AND ‘1/31/2015’

Example –Distribution and Sort Keys

Dist key (S) = ProductID ,Dist key (C) = ProductID,Dist key (F) = FranchiseID,Sort key (S) = Date –Total Products sold in Washington in January 2015

0 views0 comments

Recent Posts

See All

How to Install & Configure AWSCLI on macOS ?

Users using macOS can follow the below command and steps to install the latest version of AWS CLI.The user should have sudo access to execute the below command. ​​For the latest version of the AWS CLI