- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
16/11 Introduction to apache cassandra by datastax
展开查看详情
1 .Getting started with Apache Apache Cassandra™ DuyHai DOAN Apache Cassandra™ Evangelist
2 . 1 Apache Cassandra™ use-cases 2 Why do I need Apache Cassandra™ ? 3 Distribution, replication & consistency model 4 Features Summary © DataStax, All Rights Reserved. 2
3 .Apache Cassandra™ use-cases
4 .Before (< 2016) Collections/ Recommendation/ Playlists Personalization Fraud Internet of things/ detection Sensor data Messaging © 2016 DataStax, All Rights Reserved. 4
5 .Before (< 2016) Collections/ Recommendation/ Playlists Personalization Fraud Internet of things/ detection Sensor data Messaging © 2016 DataStax, All Rights Reserved. 5
6 .Before (< 2016) Collections/ Recommendation/ Playlists Personalization Fraud Internet of things/ detection Sensor data Messaging © 2016 DataStax, All Rights Reserved. 6
7 .Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 7
8 .Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 8
9 .Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 9
10 .Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 10
11 .Today (≥ 2016) © 2016 DataStax, All Rights Reserved. 11
12 .Why do I need Apache Cassandra™ ?
13 .Linear Scalability YOU C* C* C* 1k+ nodes, PB+ NetcoSports 3 nodes, ≈3GB © DataStax, All Rights Reserved. 13
14 .Continuous availability • thanks to the Dynamo architecture © DataStax, All Rights Reserved. 14
15 .Multi data-centers/cloud native • out-of-the-box (config only) • AWS/GCE/Azure/CloudStack support • Cloud/Bare-metal © DataStax, All Rights Reserved. 15
16 .Multi-DC usages Data locality, disaster recovery C* C* C* C* C* C* C* New York (DC1) London (DC2) Async C* C* replication C* C* C* C* © DataStax, All Rights Reserved. 16
17 .Multi-DC usages Virtual DC for workload segregation C* C* C* Same room C* C* C* C* Production Analytics (LIVE) (Spark) Async C* C* replication C* C* C* C* © DataStax, All Rights Reserved. 17
18 .Multi-DC usages Prod data copy for back-up/benchmark C* C* C* C* C* C* C* Use LOCAL_XXX My tiny test DC Consistency READ-ONLY!!! Levels Async C* C* replication C* C* C* C* © DataStax, All Rights Reserved. 18
19 .Operational simplicity • 1 node = 1 process + 2 config files (cassandra.yaml + cassandra-rackdc.properties) • deployment automation (Ansible …) • No role between nodes, perfect symmetry © DataStax, All Rights Reserved. 19
20 .Eco System • Apache Spark – Apache Cassandra integration • analytics • joins, aggregation • SparkSQL/Dataframe integration with CQL (predicates push down) • Apache Zeppelin – Apache Cassandra integration • web-based notebook • tabular/graph display © DataStax, All Rights Reserved. 20
21 . !" Q&A © 2016 DataStax, All Rights Reserved. 21
22 .Apache Cassandra™ Architecture
23 .The Tokens Random hash of #partition à token = hash(#p) C* C* Hash: ] –x, x ] hash range: 264 values C* C* x = 264/2 C* C* C* C* © 2016 DataStax, All Rights Reserved. 23
24 . Token Ranges ⎤ 3x ⎤ ⎤ x⎤ A:⎥⎥−x,− ⎥⎥ E:⎥⎥0, ⎥⎥ B C ⎦ 4⎦ ⎦ 4⎦ ⎤ 3x 2x ⎤ ⎤ x 2x ⎤ B:⎥⎥− ,− ⎥⎥ F :⎥⎥ , ⎥⎥ A D ⎦ 4 4⎦ ⎦4 4 ⎦ ⎤ 2x x ⎤ ⎤ 2x 3x ⎤ C:⎥⎥− ,− ⎥⎥ G:⎥⎥ , ⎥⎥ H E ⎦ 4 4⎦ ⎦4 4⎦ ⎤ x ⎤ ⎤ 3x ⎤ D:⎥⎥− ,0⎥⎥ H :⎥⎥ ,x ⎥⎥ G F ⎦ 4 ⎦ ⎦4 ⎦ © 2016 DataStax, All Rights Reserved. 24
25 .Distributed Tables CREATE TABLE users( user_id int, B C …, PRIMARY KEY(user_id) ); A D user_id1 H E user_id2 user_id3 G F user_id4 user_id5 © 2016 DataStax, All Rights Reserved. 25
26 .Distributed Tables B C user_id3 user_id4 A D H user_id2 E user_id1 G F user_id5 © 2016 DataStax, All Rights Reserved. 26
27 .Linear Scalability Today = high load, production In danger B C A D H E G F © 2016 DataStax, All Rights Reserved. 27
28 .Scaling Out +2 nodes to lower the pressure C D B E A F J G I H © 2016 DataStax, All Rights Reserved. 28
29 . !" Q&A © 2016 DataStax, All Rights Reserved. 29