- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Vectorized executor in TiDB
介绍最近TiDB向量化执行引擎近期的工作,包括表达式计算的向量化优化,以及Join、Aggregation等算子的向量化优化的方法和带来的成果。
展开查看详情
1 . Recent work on Vectorized exectuor Presented by Feng Liyuan
2 .About me ● Feng Liyuan (冯立元) ● Was: Experienced Engineer, Cloud Storage, Qiniu ● Now: Engineer, TiDB SQL Engine Team ● Focus on: TiDB Runtime PingCAP.com
3 .Agenda ● Backgroud ○ What is executor? ○ What is vectorized executor on TiDB ● Recent work ○ Hash Join ○ Stream Aggregation ○ Vectorized expression evaluation ● Future work & Call for participation
4 .Part I - Backgroud
5 .Row-based executor Volcano Iterator Model next() tuple Aggregation: SELECT SUM(price) sum(price) next() tuple FROM goods Selection: shop_id = 10 WHERE shop_id = 10 next() tuple Scan: goods PingCAP.com
6 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
7 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
8 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
9 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
10 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 1 5 10.5
11 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
12 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
13 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 2 10 1.2
14 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 2 10 1.2
15 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
16 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
17 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
18 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 3 10 13.7
19 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) id shop_id price 3 10 13.7
20 .Volcano Iterator Model SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 next() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) sum(price) 14.9
21 .Volcano Iterator Model ● A.k.a Tuple at a time next() tuple ● Elegant, flexible, extensible and Hash powerful[2] GroupBy ● Efficient at that time next() tuple ○ disk I/O is the main overhead Hash Join ● Low memory cost next() tuple next() ● Very natural for row stores Selection Selection [2] Graefe, Goetz. "Volcano/spl minus/an extensible and parallel query evaluation system." IEEE Transactions on Knowledge and Data Engineering 6.1 (1994): 120-135.
22 .Vectorized executor Vectorized We call it We call it chunk column Row-based We call it id shop_id price row 1 5 10.5 id shop_id price 2 10 1.2 1 5 10.5 3 10 13.7 2 10 1.2 ... ... ... 3 10 13.7 ... ... ... 1024 10 15.3 1025 ... 2048 PingCAP.com
23 .Vectorized executor nextChunk() chunk Aggregation: SELECT SUM(price) sum(price) nextChunk() chunk FROM goods Selection: shop_id = 10 WHERE shop_id = 10 nextChunk() chunk Scan: goods PingCAP.com
24 . SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
25 . SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 nextChunk() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
26 . SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 nextChunk() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
27 . SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 nextChunk() Scan: goods Selection: shop_id = 10 Aggregation: sum(price)
28 . SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 nextChunk() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) A chunk which has 1024 rows
29 . SELECT SUM(price) FROM goods WHERE shop_id = 10 id shop_id price 1 5 10.5 2 10 1.2 3 10 13.7 ... ... ... ... ... ... 1024 10 15.3 nextChunk() Scan: goods Selection: shop_id = 10 Aggregation: sum(price) A chunk which has 3 rows