SQL tips: get exclusive data set

In data science & data engineering, it is often required to extracting exclusive data set from a large-scale table in a business date comparing with a history data, e.g. daily new product incremental, weekly new users, ….. Here two ways are discussed using an example of how to calculate daily new products.

Method 1: use LEFT JOIN operator, e.g.

-- daily_biz_tb is table to store daily product with schema like 
-- CREATE TABLE IF NOT EXISTS daily_biz_tb (product_id STRING, title STRING) PARTITION BY (ds STRING)

DROP TABLE IF EXISTS daily_inc_tb;
CREATE TABLE daily_inc_tb LIFECYCLE 7 AS 
SELECT t1.* FROM (
  SELECT product_id, title 
  FROM daily_biz_tb 
  WHERE ds='${bizdate}'
)t1
LEFT JOIN (
  SELECT product_id, title 
  FROM daily_biz_tb 
  WHERE ds='${yesterday}'
)t2
ON t1.product_id=t2_product_id
WHERE t2.product_id IS NULL
;

The logic is that if product_id in today but not in yesterday, after left join, the column t2.product_id will be NULL in today.

Method 2: left anti join. The LEFT ANTI JOIN will be much elegant. It will exclude all item based on key product_id.

-- daily_biz_tb is table to store daily product with schema like 
-- CREATE TABLE IF NOT EXISTS daily_biz_tb (product_id STRING, title STRING) PARTITION BY (ds STRING)

DROP TABLE IF EXISTS daily_inc_tb;
CREATE TABLE daily_inc_tb LIFECYCLE 7 AS 
SELECT t1.* FROM (
  SELECT product_id, title 
  FROM daily_biz_tb 
  WHERE ds='${bizdate}'
)t1
LEFT ANTI JOIN (
  SELECT product_id, title 
  FROM daily_biz_tb 
  WHERE ds='${yesterday}'
)t2
ON t1.product_id=t2_product_id
;

Which one is recommended? Of course, left anti join. It is not only elegant but also performance better when from a big table excluding data in a small table. e.g. 1T table excluding 1K data in small table. In practice, method 1 sometimes has performance issue while method 2 fast.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s