使用 PostgreSQL 的 EXPLAIN 時,控制節點的出現,對查詢復雜度的影響不可忽視。
Limit PostgreSQL 用戶都會使用 LIMIT,因為它非常簡單,但讓我們來全面描述一下它。LIMIT 操作會執行其下屬操作,并僅返回該操作所返回結果中的前 N 行。通常情況下,在返回結果后它會停止下屬操作,但在某些情況下(例如在 PL/PgSQL 函數中),當下屬操作返回第一行時,該操作可能已經完成了。
簡單示例:
explain analyze select * from pg_class; QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on pg_class (cost = 0.00 . .10 .92 rows = 292 width = 203 ) (actual time = 0.008 . .0 .047 rows = 295 loops = 1 ) Total runtime: 0.096 ms ( 2 rows ) explain analyze select * from pg_class limit 2 ; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Limit (cost = 0.00 . .0 .07 rows = 2 width = 203 ) (actual time = 0.009 . .0 .010 rows = 2 loops = 1 ) - > Seq Scan on pg_class (cost = 0.00 . .10 .92 rows = 292 width = 203 ) (actual time = 0.008 . .0 .009 rows = 2 loops = 1 ) Total runtime: 0.045 ms ( 3 rows )
正如你所看到的,在第二個示例中使用 LIMIT 使得底層的順序掃描(Seq Scan)在找到兩行數據后立即停止了工作。
Append 該計劃只是執行多個下屬操作,并將所有返回的行合并為一個結果集返回。
它用于 UNION/UNION ALL 查詢:
explain select oid from pg_class union all select oid from pg_proc union all select oid from pg_database; QUERY PLAN ----------------------------------------------------------------- Append (cost = 0.00 . .104 .43 rows = 2943 width = 4 ) - > Seq Scan on pg_class (cost = 0.00 . .10 .92 rows = 292 width = 4 ) - > Seq Scan on pg_proc (cost = 0.00 . .92 .49 rows = 2649 width = 4 ) - > Seq Scan on pg_database (cost = 0.00 . .1 .02 rows = 2 width = 4 ) ( 4 rows )
在這里你可以看到,Append 操作對三個表執行了三次掃描,并將所有行合并后返回。
請注意,上面使用的是 UNION ALL。如果我們使用 UNION,結果會是這樣:
explain select oid from pg_class union select oid from pg_proc union select oid from pg_database; QUERY PLAN ----------------------------------------------------------------------- HashAggregate (cost = 141.22 . .170 .65 rows = 2943 width = 4 ) - > Append (cost = 0.00 . .133 .86 rows = 2943 width = 4 ) - > Seq Scan on pg_class (cost = 0.00 . .10 .92 rows = 292 width = 4 ) - > Seq Scan on pg_proc (cost = 0.00 . .92 .49 rows = 2649 width = 4 ) - > Seq Scan on pg_database (cost = 0.00 . .1 .02 rows = 2 width = 4 ) ( 5 rows )
這是因為 UNION 會去除重復的行:在這種情況下,這是通過哈希聚合(HashAggregate)操作來實現的。
InitPlan 當查詢中的某部分可以(或必須)在其他所有操作之前計算,并且它不依賴于查詢其余部分的任何內容時,就會出現這種計劃。
例如,假設你有這樣一個查詢:
explain select * from pg_class where relkind = ( select relkind from pg_class order by random() limit 1 ); QUERY PLAN ------------------------------------------------------------------------------------------ Seq Scan on pg_class (cost = 13.11 . .24 .76 rows = 73 width = 203 ) Filter : (relkind = $ 0 ) InitPlan 1 ( returns $ 0 ) - > Limit (cost = 13.11 . .13 .11 rows = 1 width = 1 ) - > Sort (cost = 13.11 . .13 .84 rows = 292 width = 1 ) Sort Key: (random()) - > Seq Scan on pg_class pg_class_1 (cost = 0.00 . .11 .65 rows = 292 width = 1 ) ( 7 rows )
在這種情況下,需要先執行 LIMIT / 排序 / 順序掃描操作,然后再對 pg_class 執行常規的順序掃描,因為 PostgreSQL 必須將 relkind 值與子查詢返回的值進行比較。
另一方面,如果我這樣寫:
explain select * , ( select length( 'redrock' )) from pg_class; QUERY PLAN ------------------------------------------------------------- Seq Scan on pg_class (cost = 0.01 . .10 .93 rows = 292 width = 203 ) InitPlan 1 ( returns $ 0 ) - > Result (cost = 0.00 . .0 .01 rows = 1 width = 0 ) ( 3 rows )
PostgreSQL 會正確地發現子查詢的列不依賴于 pg_class 表中的任何數據,因此它只需運行一次,不必為每一行重新計算長度。
當然,你可以有多個初始計劃,如下所示:
explain select * , ( select length( 'redrock' )) from pg_class where relkind = ( select relkind from pg_class order by random() limit 1 ); QUERY PLAN ------------------------------------------------------------------------------------------ Seq Scan on pg_class (cost = 13.12 . .24 .77 rows = 73 width = 203 ) Filter : (relkind = $ 1 ) InitPlan 1 ( returns $ 0 ) - > Result (cost = 0.00 . .0 .01 rows = 1 width = 0 ) InitPlan 2 ( returns $ 1 ) - > Limit (cost = 13.11 . .13 .11 rows = 1 width = 1 ) - > Sort (cost = 13.11 . .13 .84 rows = 292 width = 1 ) Sort Key: (random()) - > Seq Scan on pg_class pg_class_1 (cost = 0.00 . .11 .65 rows = 292 width = 1 ) ( 9 rows )
不過,有一個重要的點是:單個查詢中初始計劃的編號是 “全局的”,而不是 “按操作的”。
SubPlan 子計劃(SubPlan)與嵌套循環(NestedLoop)有點相似,因為它們都可能被多次調用。
調用子計劃是為了計算來自子查詢的數據,而該子查詢實際上依賴于當前行。
例如:
explain analyze select c.relname, c.relkind, ( select count ( * ) from pg_class x where c.relkind = x.relkind) from pg_class c; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Seq Scan on pg_class c (cost = 0.00 . .3468 .93 rows = 292 width = 65 ) (actual time = 0.135 . .26 .717 rows = 295 loops = 1 ) SubPlan 1 - > Aggregate (cost = 11.83 . .11 .84 rows = 1 width = 0 ) (actual time = 0.090 . .0 .090 rows = 1 loops = 295 ) - > Seq Scan on pg_class x (cost = 0.00 . .11 .65 rows = 73 width = 0 ) (actual time = 0.010 . .0 .081 rows = 93 loops = 295 ) Filter : (c.relkind = relkind) Rows Removed by Filter : 202 Total runtime: 26.783 ms ( 7 rows )
對于 “pg_class as c” 的掃描所返回的每一行,PostgreSQL 都必須運行子計劃,該子計劃會檢查 pg_class 中有多少行在 relkind 列上具有與當前處理行相同的值。
請注意,“Seq Scan on pg_class x” 行中的 “loops=295”,與前面 “Seq Scan on pg_class c” 節點中的 “rows=295” 是相匹配的。
轉載 ?
該文章在 2025/8/27 15:19:19 編輯過