- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Apache Spark on K8S Best Practice and Performance in the Cloud
展开查看详情
1 .Jimmy Chen, Junping Du Tencent Cloud
2 .Agenda • Spark service in the cloud – our practice • Spark on Kubernetes – architecture, best practice and performance
3 .About us Jimmy Chen Junping Du Sr. Software Engineer at Tencent Cloud Principal Engineer & Director at Tencent Cloud Apache Hadoop, Spark, Parquet Contributor Apache Hadoop PMC & ASF Member Previously, worked at Intel Previously, worked at Hortonwork and VMware
4 .About Tencent and Tencent Cloud The 331th largest company in the Fortune in 2018 Top 100 Most Valuable Global Brands 2018 ranked 1st in China, ranked 5th global World’s 50 Most Innovative Companies 2015 ranked 1st in China
5 .Case study for Spark in the Cloud - Sparkling , , , , , ,
6 .Sparkling Architecture Sparkling Data Warehouse Manager Work Space Tencent Cloud Infrastructure JDBC Livy-Thrift Data Application Query Engine SQL Optimizer SQL IDE Notebook Execution Analysis Engine - Spark Columnar Query Storage Work Flow Monitor Data Sync BI ML RDBMS COS Meta Data Atlas Store Catalog NoSQL ES … KAFKA … RDBMS Data Lake Data Catalog and Management
7 .Motivations towards dockerized Spark • Serverless – Better isolation • On-premises deployment – Bare metal infrastructure, control protocol not work – DevOps scope – Cost
8 .Our Practice • Sparkling on Kubernetes – Cloud API server -> Kubernetes API server – CVM resource -> container resource – Cloud load balancer -> Kubernetes load balancer – Cloud DNS -> Kubernetes ingress/coredns – CVM image -> docker image – etc.
9 .Kubernetes New open-source cluster manager. app app - github.com/kubernetes/kubernetes libs libs app app libs libs Runs programs in Linux containers. kernel 2,000+ contributors and 75,000+ commits.
10 .Kubernetes Architecture Pod, a unit of scheduling and isolation. ● runs a user program in a primary container ● holds isolation layers like a virtual IP in an infra container Pod 1 Pod 1 Pod 1 Pod 2 Pod 2 Pod 2 node A 172.17.0.5 node B 172.17.0.6 node C 172.17.0.7
11 .Kubernetes Advantages • .. . . . . . • , . . ,,. , . . , • , •
12 .Spark On Kubernetes Workflow spark-submit Kubernetes spark operator
13 .Spark on Kubernetes Status • Spark side • Kubernetes side – Cluster/Client mode – Support spark-ctl CLI – Volume mounts – Native cron support for scheduled app – Customized Spark pod, e.g. mounting – Multiple-language support, such as: configMap and volume, setting Python, R affinity/anti-affinity – Static executor allocation – Monitoring through Prometheus • On-going Work • On-going Work – Dynamic Executor Allocation – Support GPU resource scheduler – Job Queues and Resource – Spark operator HA Management – Multiple spark version support – Spark Application Management – Priority queues
14 .Sparkling on Kubernetes Architecture Kubernetes cluster Notebook Service pod Metastore pod Persistent volume DaemonSet Zeppelin Service pod Ingress Livy Server pod Airflow Service pod Thrift Server pod Grafana Service pod Datasync pod History Service pod Node Node Node Node Node pod pod pod pod pod pod pod pod pod pod
15 .Submit via spark-submit Step 1: check cluster master #kubectl cluster-info Kubernetes master is running at https://169.254.128.15:60002 Step 2: submit application via spark-submit #bin/spark-submit \ --master k8s://https://169.254.128.7:60002 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.kubernetes.container.image=sherrytima/spark2.4.1:k8s \ local:///opt/spark/examples/jars/spark-examples_2.11-2.4.1.jar
16 .Submit via kubectl/sparkctl • Install spark operator on your kubernetes cluster # helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator # helm install incubator/sparkoperator --namespace spark-operator • Declare application yaml • Run the application – kubectl create -f spark-pi.yaml – spark-ctl create spark-pi.yaml
17 .Benchmark Environment • Software Kubernetes: 1.10.1 Hadoop/Yarn: 2.7.3 Spark: 2.4.1 OS: Centos-7.2 • Hardware Master – CPU/mem: 8 core + 64G mem – Disk: 200G SSD – NUM: 1 Slaves – CPU/mem:32 core + 128G mem – Disk: 4 HDD – NUM: 6
18 .Terasort • Data Size 100G 900 Lower is better spark.executor.instance 5 800 spark.executor.cores 2 700 600 spark.executor.memory 10g Time in Seconds 500 spark.executor.memoryoverhead 2g 400 spark.driver memory 8g 300 • Data Size 500G 200 100 – Spark on kubernetes failed 0 – Spark on Yarn (~35 minutes) spark on yarn spark on k8s terasort 100g
19 .Root Cause Analysis • Pod always be evicted due to disk pressure, the spark.local.dir is mounted as emptyDir. • emptyDir is ephemeral storage on node
20 .Possible Solutions • mount emptyDir with RAM backed volumes. spark.kubernetes.local.dirs.tmpfs= true • Extend disk capacity of kubelet workspace
21 .Result Terasort 100G lower is better 900 800 700 600 Time in Seconds 500 400 300 200 100 0 spark on yarn spark on k8s (default) spark on k8s with tmpfs
22 .Spark-sql-perf • Use spark-sql-perf repo from Databricks – 10 shuffle bound queries – spark application base on spark-sql-perf • Run in cluster mode for Kubernetes bin/spark-submit --name tpcds-1000 --jars hdfs://172.17.0.8:32001/run-lib/* --class perf.tpcds.TpcDsPerf --conf spark.kubernetes.container.image=sherrytima/spark-2.4.1:k8sv8 --conf spark.driver.args="hdfs://172.17.0.8:32001/run-tpcds 1000 orc true" hdfs://172.17.0.8:32001/tpcds/tpcds_2.11-1.1.jar
23 .Benchmark Result • Configuration 1200000.0 tpcds@1000 lower is better spark.executor.instance 24 1000000.0 spark.executor.cores 6 spark.executor.memory 24g 800000.0 time in ms spark.executor.memoryoverhead 4g 600000.0 spark.driver memory 8g 400000.0 • Result 200000.0 Spark on kubernetes is much 0.0 slower than spark on yarn!!! q4 q11 q17 q25 Spark on yarn q29 q64 Spark on k8s q74 q78 q80 q93
24 .Root Cause • The kubelet work directory can only be mounted on one disk, so that the spark scratch space only use ONE disk. • While running in yarn mode, the spark scratch space use as many disk as yarn.local.dir configures.
25 .Possible workaround • RAM backed media as spark scratch space
26 .Our Solution Use specified volumes mount as spark local directory. SPARK-27499
27 . Result TPCDS@1000 lower is better TPCDS@1000 lower is better 1200000.0 1000000.0 CreateTime (ms) 800000.0 time in ms 600000.0 400000.0 200000.0 0.0 q4 q11 q17 q25 q29 q64 q74 q78 q80 q93 spark on k8s orig spark on k8s optimized . 0 . .
28 .Summary • Spark on Kubernetes practices – scenario in serverless and cloud native • But we hit problems in production - local storage issues, etc. • Work with community for better solution and best practices
29 .Future Work • Close performance gap with yarn mode • Data locality awareness for Kubernetes scheduler • Prometheus/Grafana integration