申请试用
HOT
登录
注册
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at Facebook

Vectorized Query Execution in Apache Spark at Facebook

Spark开源社区
/
发布于
/
9976
人观看
A standard query execution system processes one row at a time. Vectorized query execution batches multiples rows together in a columnar format, and each operator uses simple loops to iterate over data within a batch. This feature greatly reduces the CPU usage for reading, writing and query operations like scanning, filtering. In this talk, we will take a deep dive into Facebook’s ORC-based vectorized reader and writer implementation, discuss how vectorization affects performance of various data types in Hive/Spark, and quantify the improvements vectorization brings to the Facebook Warehouse.
7点赞
3收藏
6下载
确认
3秒后跳转登录页面
去登陆