Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage

下载 1

Spark开源社区

发布于

8320

人观看

#信息技术

At Pure Storage, our strong belief in aggressive automated testing has caused our continuous integration (CI) systems to generate massive amounts of messy log data. Spark’s flexible computing platform allows us to write a single application to understand the state of our CI pipeline for both streaming (over a million events per second) and batch jobs (at 40TB/hour). Decoupling our data storage enabled us to orchestrate and independently scale stateless pipeline components (spark, kafka, rsyslog, and custom code) using nomad. In this talk, we will discuss how we architected our data pipeline to leverage simple orchestration and enable resiliency with ephemeral compute components.

展开查看详情

3 . Data Pipeline – Early Stages 1,000+ 20,000+ VMs tests 100+ 12 FBs 16 12 12 40 16 12 12 40 6T 16 18T 12 18T 12 6G 40 400+ 16 12 12 40 clients 12 rsyslog 10+ Jenkins 6G 12 Custom code 3 © 2018 PURE STORAGE INC. PURE PROPRIETARY

4 . Data Pipeline - Now 12 12 120,000+ 12 12 tests / day 12 12 2,500+ 12 12 VMs 12 12 12 16 12 12 16 12 350+ 12 12 FBs 16 12 12 72T 16 24T 16 72T 12 12 800G 12 1,000+ 16 12 12 clients 12 rsyslog 12 ü Duplicate bug 12 12 20+ 12 ü Infrastructure failure Jenkins 12 200T 12 90G 12 ü Performance regression 12 12 12 12 12 189T 12 50G 12 ü Low level details 12 12 ü Easy to read graphs 4 © 2018 PURE STORAGE INC. PURE PROPRIETARY

8 .Efficiency and Flexibility 1. Application stack to solve every kind of problem and they are easy to setup 2. Application silos are inefficient and increase operational cost 3. Scale may require re-architecting a given stage Decouple compute and storage 8 © 2018 PURE STORAGE INC. PURE PROPRIETARY

9 .Technologies we use • Docker: Containers • Nomad: Orchestration • Prometheus: Monitoring • Grafana: Dashboards • Consul: Service discovery • Chef: Container build • Jenkins: Continuous Integration • Kafka Manager: Kafka Interface • Artifactory: Image repository • Ansible: Configuring servers 9 © 2018 PURE STORAGE INC. PURE PROPRIETARY

0点赞

0收藏

1下载