- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud
展开查看详情
1 .Using S3 Select to Deliver 100X Performance Improvements Versus the Public Cloud Frank Wessels CTO, MinIO
2 . Applications S3 Select Before Up to 400% faster After ▪ Recent addition to S3 API Applications S3 SELECT ○ Offload filtering to storage Up to 80% Cheaper ○ Formats: CSV, JSON, Parquet ▪ Advantages ○ Faster ○ Less network traffic ○ Smaller compute nodes ■ S3 Select for Spark ○ https://github.com/minio/spark-select 2
3 .Introduction to MinIO MinIO is a high performance, distributed object storage server, designed for peta-scale data infrastructure. S3-Compatible Scalable Simple Performant Optimized for Intel/ ARM/Power9 CPUs 3
4 .Global Scale 4
5 .Focus on Performance 5
6 .S3 Select Performance on AWS Format Time (s) Records Throughput csv 5.46 733K/s 94 MB/s json 14.28 280K/s 98 MB/s parquet 32.25 124K/s 4.3 MB/s 6
7 .Accelerating S3 Select on minio CSV JSON Parquet Parsing Parsing Loading Evaluation (“where”) Processing (“select”) 7
8 .First 10X Acceleration: Zero Copy Manage memory allocations: garbage collected vs. non-garbage collected Source: https://bitbucket.org/ewanhiggs/csv-game 8
9 .Second 10X Acceleration: SIMD ▪ SIMD = Single Instruction Multiple Data ○ Intel: AVX2 ▪ Process 32 bytes in parallel ○ delimiter / separator detection ○ bitmap handling & parsing ○ string compares ▪ Performance (single core) 9
10 .Results using select-simd ▪ Same queries as before ○ minio with select-simd vs AWS S3 10
11 . Demo ■ Source data ○ parking-citations.csv (25M rows / 3.5 GB) ■ AWS region ○ us-east-1 ■ minio with select-simd-integration branch running on a single instance: c5.2xlarge (8 vCPUs) ■ mc client running in same region on c5.large instance
12 .Status and what’s next ▪ Works in progress ○ Initial focus on CSV ▪ Next: add support for ○ Parquet ○ JSON: https://github.com/lemire/simdjson ▪ Investigate AVX-512 ○ erasure coding ▫ AVX-512 4x speedup over AVX2 ○ k-registers are great / 2KB on-core register space ▪ Dynamic code generation (think LLVM) 12
13 .High performance object storage Power9 CPUs PCIe Gen4 24x NVMe Dual Mellanox CX5 (4x100 GbE/s) 13
14 .S3 Select benefits for Spark ▪ Benefits ○ Faster queries ○ Less network traffic ○ Smaller compute needs ▪ Stay tuned for overall impact ○ S3 “plain” vs S3 Select ○ minio/simd-select vs AWS S3 Select
15 .Questions? Visit our booth #509 @minio https://github.com/minio/minio https://slack.minio.io https://minio.io