- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
ClickHouse 2018 How to stop waiting for your queries to complete
ClickHouse 2018 How to stop waiting for your queries to complete and start having fun.
展开查看详情
1 . ClickHouse 2018 How to stop waiting for your queries to complete and start having fun Alexander Zaitsev Altinity
2 .Who am I M.Sc. In mathematics from Moscow State University Software engineer since 1997 Developed distributed systems since 2002 Focused on high performance analytics since 2007 Director of Engineering in LifeStreet Co-founder of Altinity – ClickHouse Service Provider 2
3 ... and I am not Peter’s brother :) 3
4 .What Is ClickHouse? 4
5 . 5 © http://mattturck.com/
6 .ClickHouse DBMS is Column Store MPP Realtime SQL Open Source 6
7 . http://clickhouse.yandex • Developed by Yandex for Yandex.Metrica - Yandex (NASDAQ: YNDX) – “Russian Google” (50% market share in search, 50+ b2b and b2c products) - Yandex.Metrica – world 2nd largest web analytics platform • Open Source since June 2016 (Apache 2.0 license) • 200+ companies using in production today • Several hundred experimenting, doing POC etc. • Dozens of contributors to the source code 7
8 .Why Yet Another DBMS? 8
9 .Flexible SQL 9
10 .Commercial OpenSource Analytical Analytical DBMS DBMS 11
11 .ClickHouse Fast! Flexible! Free! Fun! 12
12 .How Fast? 13
13 .1+ trillion rows table :) select count(*) from dw.ad8_fact_event; SELECT count(*) FROM dw.ad8_fact_event ┌───────count()─┐ │ 1261705085657 │ └───────────────┘ 1 rows in set. Elapsed: 3.552 sec. Processed 1.26 trillion rows, 1.26 TB (355.22 billion rows/s., 355.22 GB/s.) 14 Altinity Ltd. www.altinity.com
14 .1+ trillion rows table :) select sum(price_cpm) from dw.ad8_fact_event where access_day=today()-1 and event_key=-2; SELECT sum(price_cpm) FROM dw.ad8_fact_event WHERE (access_day = (today() - 1)) AND (event_key = -2) ┌────sum(price_cpm)─┐ │ 87579.09035192338 │ └───────────────────┘ 1 rows in set. Elapsed: 0.168 sec. Processed 161.89 million rows, 2.91 GB (961.83 million rows/s., 17.31 GB/s.) 15 Altinity Ltd. www.altinity.com
15 .WikiStat data, 28B rows. https://www.percona.com/blog/2017/03/17/column-store-database-benchmarks-mariadb-columnstore-vs-clickhouse-vs-apache-spark/ 16
16 .“1.1 Billion Taxi Rides Benchmarks” http://tech.marksblogg.com/benchmarks.html Query 1 Query 2 Query 3 Query 4 Setup 0.034 0.061 0.178 0.498 MapD & 2-node p2.8xlarge cluster 0.051 0.146 0.047 0.794 kdb+/q & 4 Intel Xeon Phi 7210 CPUs - 2.415 3.599 4.962 ClickHouse at Altinity demo server 0.762 2.472 4.131 6.041 BrytlytDB 1.0 & 2-node p2.16xlarge cluster 1.034 3.058 5.354 12.748 ClickHouse, Intel Core i5 4670K 1.56 1.25 2.25 2.97 Redshift, 6-node ds2.8xlarge cluster 2 2 1 3 BigQuery 6.41 6.19 6.09 6.63 Amazon Athena 8.1 18.18 n/a n/a Elasticsearch (heavily tuned) 14.389 32.148 33.448 67.312 Vertica, Intel Core i5 4670K 22 25 27 65 Spark 2.3.0 & single i3.8xlarge w/ HDFS 35 39 64 81 Presto, 5-node m3.xlarge cluster w/ HDFS 152 175 235 368 PostgreSQL 9.5 & cstore_fdw 17
17 . 2016 LifeStreet benchmark (unpublished) • 19 queries, 1200M rows table, 3-node clusters 18
18 .Time Series benchmarks (first time today!) https://github.com/timescale/tsbs Benchmark suite to automate testing Loads 103M rows, 10 metrics per row Runs 15 queries, 1000 runs each in 8 parallel threads Supports TimescaleDB, InfluxDB, Cassandra, MongoDB and ClickHouse (Altinity PR is submitted) 19
19 . Load time (s) 900 800 700 600 500 400 300 200 100 0 ClickHouse TimescaleDB InfluxDB 20
20 .80 “Light” queries, time in ms 70 60 50 40 30 ClickHouse TimescaleDB 20 InfluxDB 10 0 21
21 .90 “Heavy” queries, time in sec 80 70 60 50 40 ClickHouse 30 TimescaleDB 20 InfluxDB 10 0 22
22 .How flexible? 23
23 .ClickHouse runs at Bare metal (any Linux) Amazon Azure VMware, VirtualBox Docker, K8s 24
24 .ClickHouse solves business problems at: Mobile App and Web analytics AdTech bidding analytics Operational Logs analytics DNS queries analysis Stock correlation analytics Telecom Security audit Fintech SaaS Manufactoring process control BlockChain transactions analysis 25
25 . Worldwide * www.altinity.com visits in 2018 26
26 . Size does not matter Yandex: 500+ servers, 25B rec/day LifeStreet: 60 servers, 75B rec/day CloudFlare: 36 servers, 200B rec/day Bloomberg: 102 servers, 1000B rec/day Toutiao: 400 servers, moving to 1000 this month 27
27 .How fun ☺ life←{↑1 ω∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂ω} 28
28 .with (select groupArray(C) from C) as Ca select id, groupArray(S) Sa, groupArray(V) Va, groupArray(D) Da, groupArray(P) Pa, arrayMap(c -> arrayFirstIndex(s -> s > c, Sa)-1, Ca) Ka, arrayMap((c,k) -> Va[k] + (Va[k+1] - Va[k])/(Sa[k+1] - Sa[k])*(c-Sa[k]),Ca,Ka) Ta, arrayMap(s -> arrayFirstIndex(c -> c>s, Ca)>0 ? arrayFirstIndex(c -> c>s, Ca)-1 : toInt32(length(Ca)), Sa) Ja, arrayMap(i -> Ta[i], Ja) Ra, arrayMap((v,r) -> v - r, Va, Ra) ARa, arraySum((x,y,z) -> x*y*z, ARa, Da, Pa) result from T group by id 29
29 .What’s new in 2018 • Table functions mysql/odbc/file/http • clickhouse-copier • Predicate pushdown for views/subselects • LowCardinality datatype • Decimal datatype • JOIN enhancements • ALTER TABLE UPDATE/DELETE • WITH ROLLUP … and tons of performance improvements and small features 30