申请试用
HOT
登录
注册
 
Modular Apache Spark: Transform Your Code in Pieces

Modular Apache Spark: Transform Your Code in Pieces

Spark开源社区
/
发布于
/
5477
人观看
Divide and you will conquer Apache Spark. It’s quite common to develop a papyrus script where people try to initialize spark, read paths, execute all the logic and write the result. Even, we found scripts where all the spark transformations are done in a simple method with tones of lines. That means the code is difficult to test, to maintain and to read. Well, that means bad code. We built a set of tools and libraries that allows developers to develop their pipelines by joining all the Pieces. These pieces are compressed by Readers, Writers, Transformers, Aliases, etc. Moreover, it comes with enriched SparkSuites using the Spark-testing-base from Holden Karau. Recently, we start using junit4git (github.com/rpau/junit4git) in our tests, allowing us to execute only the Spark tests that matter by skipping tests that are not affected by latest code changes. This translates into faster builds and fewer coffees. By allowing developers to define each piece on its own, we enable to test small pieces before having the full set of them together. Also, it allows to re-use code in multiple pipelines and speed up their development by improving the quality of the code. The power of “Transform” method combined with Currying, creates a powerful tool that allows fragmenting all the Spark logic. This talk is oriented to developers that are being introduced in the Spark world and how developing iteration by iteration in small steps could help them in producing great code with less effort.
0点赞
2收藏
3下载
确认
3秒后跳转登录页面
去登陆