申请试用
HOT
登录
注册
 
Data Migration with Spark to Hive

Data Migration with Spark to Hive

Spark开源社区
/
发布于
/
8234
人观看
In this presentation, Vineet will be explaining case study of one of my customers using Spark to migrate terabytes of data from GPFS into Hive tables. The ETL pipeline was built purely using Spark. The pipeline extracted target (Hive) table properties such as – identification of Hive Date/Timestamp columns, whether target table is partitioned or non-partitioned, target storage formats (Parquet or Avro) and source to target columns mappings. These target tables contain few to hundreds of columns and non standard date fomats into Hive standard timestamp format.
0点赞
0收藏
1下载
确认
3秒后跳转登录页面
去登陆