- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Apache Spark开发介绍
展开查看详情
1 .Intro to Spark Development June 2015: Spark Summit West / San Francisco http ://training.databricks.com/intro.pdf https :// www.linkedin.com /in/ bclapper
2 .m aking big data simple Databricks Cloud: “A unified platform for building Big Data pipelines – from ETL to Exploration and Dashboards, to Advanced Analytics and Data Products .” Founded in late 2013 by the creators of Apache Spark Original team from UC Berkeley AMPLab Raised $47 Million in 2 rounds ~55 employees We’re hiring! Level 2/3 support partnerships with Hortonworks MapR DataStax ( http://databricks.workable.com )
3 .The Databricks team contributed more than 75% of the code added to Spark in the past year
4 .Agenda History of Big Data & Spark RDD fundamentals Databricks UI demo Lab: DevOps 101 Transformations & Actions Before Lunch Transformations & Actions (continued) Lab: Transformations & Actions Dataframes Lab: Dataframes Spark UIs Resource Managers: Local & Stanalone Memory and Persistence Spark Streaming Lab: MISC labs After Lunch
5 .Some slides will be skipped Please keep Q&A low during class (5pm – 5:30pm for Q&A with instructor) 2 anonymous surveys: Pre and Post class Lunch: noon – 1pm 2 breaks (sometime before lunch and after lunch)
6 .Homepage: http :/ / www.ardentex.com / LinkedIn: https :// www.linkedin.com/in/bclapper @brianclapper 30 years experience building & maintaining software systems Scala, Python, Ruby, Java, C, C# Founder of Philadelphia area Scala user group (PHASE) Spark instructor for Databricks Instructor: Brian Clapper
7 .Survey completed by 58 out of 115 students Your job?
8 .Survey completed by 58 out of 115 students Traveled from?
9 .Survey completed by 58 out of 115 students Which Industry?
10 .Survey completed by 58 out of 115 students Prior Spark training?
11 .Survey completed by 58 out of 115 students Hands on Experience with Spark?
12 .Survey completed by 58 out of 115 students Spark usage lifecycle?
13 .Survey completed by 58 out of 115 students Programming Experience
14 .Survey completed by 58 out of 115 students Programming Experience
15 .Survey completed by 58 out of 115 students Programming Experience
16 .Survey completed by 58 out of 115 students Big Data Experience
17 .Survey completed by 58 out of 115 students Focus of class?
18 .NoSQL battles Storage vs Processing wars Compute battles HBase vs Cassanrdra Relational vs NoSQL Redis vs Memcached vs Riak MongoDB vs CouchDB vs Couchbase MapReduce vs Spark Spark Streaming vs Storm Hive vs Spark SQL vs Impala Mahout vs MLlib vs H20 (then) (now) Solr vs Elasticsearch Neo4j vs Titan vs Giraph vs OrientDB
19 .NoSQL battles Storage vs Processing wars Compute battles HBase vs Cassanrdra Relational vs NoSQL Redis vs Memcached vs Riak MongoDB vs CouchDB vs Couchbase Neo4j vs Titan vs Giraph vs OrientDB MapReduce vs Spark Spark Streaming vs Storm Hive vs Spark SQL vs Impala Mahout vs MLlib vs H20 (then) (now) Solr vs Elasticsearch
20 .NOSQL Popularity WInners Key -> Value Key -> Doc Column Family Graph Search Redis - 95 Memcached - 33 DynamoDB - 16 Riak - 13 MongoDB - 279 CouchDB - 28 Couchbase - 24 DynamoDB – 15 MarkLogic - 11 Cassandra - 109 HBase - 62 Neo4j - 30 OrientDB - 4 Titan – 3 Giraph - 1 Solr - 81 Elasticsearch - 70 Splunk – 41
21 .General Batch Processing Pregel Dremel Impala GraphLab Giraph Drill Tez S4 Storm Specialized Systems (iterative, interactive, ML, streaming, graph, SQL, etc ) General Unified Engine (2004 – 2013) (2007 – 2015?) (2014 – ?) Mahout
22 .Scheduling Monitoring Distributing
23 .RDBMS Streaming SQL GraphX Hadoop Input Format Apps Distributions: CDH HDP MapR DSE Tachyon MLlib DataFrames API
24 .
25 .Developers from 50+ companies 400+ developers Apache Committers from 16+ organizations
26 .vs YARN SQL MLlib Streaming Mesos Tachyon
27 .10x – 100x
28 .Aug 2009 Source: openhub.net ...in June 2013
29 .Distributors Applications