- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Apache Kylin大数据OLAP利器 部分2
展开查看详情
1 .Kylin: ᩻ṛᚆ̵᩻ṛଚݎ Kylin: high performance, high concurrency ࣁຽٵᚆၥᦶහഝᵞӤ҅׀Եᑁᕆັᧃߥଫ҅ፘ HiveํጯզӤے᭛ྲ Tested on standard SSB data set, 200X faster than Apache Hive
2 .Cube ฎই֜ᦇᓒጱ How to calculate the cube
3 .Cube ฎই֜ਂ ࣁؙHBase ጱ How to persistent the cube
4 .Cube ฎই֜ັᧃጱ How to query the cube SELECT test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name, test_sites.site_name, SUM(test_kylin_fact.price) AS GMV, COUNT(*) AS TRANS_CNT FROM test_kylin_fact LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt LEFT JOIN test_category ON test_kylin_fact.leaf_categ_id = test_category.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category.site_id LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id WHERE test_kylin_fact.seller_id = 123456 OR test_kylin_fact.lstg_format_name = 'New' GROUP BY test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name,test_sites.site_name OLAPToEnumerableConverter OLAPProjectRel(WEEK_BEG_DT=[$0], category_name=[$1], CATEG_LVL2_NAME=[$2], CATEG_LVL3_NAME=[$3], LSTG_FORMAT_NAME=[$4], SITE_NAME=[$5], GMV=[CASE(=($7, 0), null, $6)], TRANS_CNT=[$8]) OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()]) OLAPProjectRel(WEEK_BEG_DT=[$13], category_name=[$21], CATEG_LVL2_NAME=[$15], CATEG_LVL3_NAME=[$14], LSTG_FORMAT_NAME=[$5], SITE_NAME=[$23], PRICE=[$0]) OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ’New'))]) OLAPJoinRel(condition=[=($2, $25)], joinType=[left]) OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left]) OLAPJoinRel(condition=[=($4, $12)], joinType=[left]) OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]]) OLAPTableScan(table=[[DEFAULT, test_category]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]]) OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]])
5 .Cube ฎই֜ັᧃጱ How to query the cube • Translate cube query into HBase table scan Columns, Group by à Cuboid ID Filters -> Scan Range (Row Key) Aggregations -> Measure Columns (Row Values) • Scan HBase table and translate HBase result into cube result HBase Result (key + value) -> Cube Result (dimensions + measures)
6 .Apache Kylin قቖአಁ Apache Kylin global users Internet FSI Telecom Manufacturing Others • eBay • ୌᦡᱷᤈ • Ӿࢵᑏۖ • ੜᔂ • MachineZon • Yahoo! • ࠟᱷᤈ • Ӿࢵኪמ • ԅ e Japan • ၳݎᱷᤈ • Ӿࢵᘶ᭗ • Ӿي • Glispa • ጯଶ • ॡଘ၇כ • AT & T • ӣจ • Inovex • ᗦࢫ ᴾ • Lenovo • Adobe • ᗑฃ • Ӿמᱷᤈ • OPPO • ᑀय़ᦔᷢ • ([SHGLD • Ӿࢵᱷᘶ • VIVO • Ղӳ • းᦤڭ • Ṳ෧ • ࠔߝտ • ࢵးށਞ • Ӥᵞࢫ • ॰ᡡ360 ᦤڭ • १ • JPMorgan • ӟḕࢮ • ൭ᑕ • ᴨ᯾ UC • ᨬॎತ಄ • ᄆᄆ • ᇶ • ᗦࢶᐹᐹ • Ḙᑾ
7 . ᗦࢫᅩᦧ Meituan & Dianping • ࢵ๋ٖय़ O2O ݪل Top O2O service provider in China • Apache Kylin ؉ԅᗦࢫᐶᕚړຉ OLAP ଘݣጱ໐ஞ҅๐ۓಅํӱۓᕚ Apache Kylin as the main OLAP platform, serving all business lines • ౼ྊ2018ଙ8์҅හഝᰁ ӡՊ҅Cube ਂ ؙ971 7%҅ྯॠ 380ӡ ེັᧃ Till Aug, 2018, total data row 8.9 trillion Cube storage 971 TB 3.8 million SQL queries per day • 50 ັᧃ < 200msັᧃ < 1.2s 50% queries < 200ms; 90% queries < 1.2s • ࢫᴚํग़ ݷApache Kylin committer & PMC Grow 3 Apache Kylin committer & PMC
8 .ੜᔂ mi.com • Apache Kylin ԅੜᔂʼnහഝૡ࣋Ŋ ໐ஞ҅๐ۓԭӱۓᕚҔ Apache Kylin act as the engine for mi’s “data factory”, serving 18 business lines • ෭ीᰁ170Պ҅95%ັᧃࣁ 500ms̶ٖ Daily incremental 17 billion, 95% queries < 500ms.
9 . <DKRR-DSDQ • ෭๋य़ጱࣁᕚᨻᇔᗑᒊԏӞ Leading search engine and portal in Japan; • Impala ᬢᑏ کKylin զჿ᪃ړຉ᬴֗ᥝ Use Kylin to replace Impala, to fulfill the low latency requirement to business analysts; • ᕷय़ग़හັᧃ 1s ٖਠ౮ Most queries are returned in less than 1s; • አහഝӾஞ᮱ᗟ҅ਖ਼ Cube ҁᘒӧฎහഝ҂വᭆکᐶړຉᬪጱහഝ Ӿஞ Kylin supports cross-region deployment, only push Cube instead of raw data to the DC that nearby the analysts https://techblog.yahoo.co.jp/oss/apache-kylin/
10 .Apache Kylin ܲݥᇇࢧᶶ Apache Kylin development history v2.6 in progress • Distributed cache • SDK for RDBMS v2.5 v2.3 • Hadoop 3/HBase • Cube 2 v2.0 planner • MySQL as v1.6 • Snowflake • Dashboard metastore v1.5 • NRT Streaming • Plug-in architect ure
11 .$SDFKH.\OLQ 5RDGPDS • New storage Parquet Druid • Real-time support • Flexible model • Containerization
12 .Thanks Apache Kylin Kyligence Apache Kylin