申请试用
HOT
登录
注册
 
Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Ac

Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Ac

Spark开源社区
/
发布于
/
7633
人观看
CPU technologies have scaled well in past years, by more complex architecture design, more wide execution pipelines, more cores in same processor, and higher frequency. However accelerators show more computational power and higher throughput with lower cost in dedicated area, which leads to more usages in Spark. But when we integrate accelerators in Spark a common case is huge performance promises through micro test with little performance boost actually we get. One reason is the cost of data transfer between JVM and accelerator. The other reason is the accelerator lack the information how it’s used in Spark. In this research, we investigate the usage of apache arrow based dataframe as the unified data sharing and transferring way between CPU and accelerators, and make it dataframe aware when we design hardware and software stack. In this way we seamlessly integrate Spark and Accelerators design and get close to promised performance.
0点赞
6收藏
8下载
确认
3秒后跳转登录页面
去登陆