AWS Redshift – Sorting Data

Sorting Data

The sort key helps Redshift minimize I/O – For example, a table sorted on timestamp and queried on date range will skip all blocks not in the query range • In the slices (on disk), the data is sorted by a sort key – If no sort key exists Redshift uses the data insertion order • Choose a sort key that is frequently used in your queries – Primarily as a query predicate (date, identifier, …) – Optionally choose a column frequently used for aggregates – Optionally choose same as distribution key column for most efficient joins (merge join) • Don’t use too many columns per table as sort keys

Sort in multiple flavors!

Compound Sort:

– Nested collection of columns

– Data is sorted by the first, and the next column is used as a ‘tie breaker’ among columns that share values in the leftmost columns. As a general rule, try to use no more than six columns

– Useful when a sort column is in a query predicate and to make group by and order by more efficient •

Interleaved Sort:

– A recently introduced feature that allows for up to 8 columns, treated as equally important rather than used as a ‘tie breaker’ as they are in compound sorts

– Good for Ad Hoc queries: when you do not know which combination of columns you will use to query your data.

– Shines with larger workloads (100m rows +)

Example –Compound Sort vs Interleaved Sort

• Product (type, color, size) • Compound sort on type, color, size

– Where type= ‘shirt’ and color=‘red’ and size= ‘xl’: full advantage of sort

– Where type= ‘shirt’ and color=‘red’: some efficiency

– Where type= ‘shirt’ and size= ‘xl’:

• Interleaved sort on type, color, size

– Using any of type, color and size in a where clause will improve performance

Example –Distribution and Sort Keys

SELECT SUM( S.Price * S.Quantity ) FROM SALES S JOIN CATEGORY C ON C.ProductId = S.ProductId JOIN FRANCHISE F ON F.FranchiseId = S.FranchiseId Where C.CategoryId = ‘Produce’ And F.State = ‘WA’ AND S.Date Between ‘1/1/2015’ AND ‘1/31/2015’

Example –Distribution and Sort Keys

Dist key (S) = ProductID ,Dist key (C) = ProductID,Dist key (F) = FranchiseID,Sort key (S) = Date –Total Products sold in Washington in January 2015

0 views0 comments

Recent Posts

See All

Users using macOS can follow the below command and steps to install the latest version of AWS CLI.The user should have sudo access to execute the below command. ​​For the latest version of the AWS CLI