- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
08_AstroServer: A Real-time Analysis System for GWAC
展开查看详情
1 .AstroServer: A Real-time Analysis System for GWAC Wei Ren RUC 10/11/2017
2 .Scientific Big Data System -Accelerating scientific discovery • Background • The Scientific Big Data System is funded by the 'National Key R&D Plan: Cloud Computing and Big Data'. Led by Chinese Academy of Sciences and joint 16 universities and institutions. • Goals: Astronomy: efficiency storage&analysis of 100billion lines astronomical catalogs High-energy physics: high-efficiency storage and retrieval of trillion-event data Bioscience: retrieval of multi-level correlation of 10-billionedge RDF knowledge graphs
3 .AstroServer: Big Astronomy Data Analytics • GWAC(the ground-based wide-angle camera array) • Covering large field & high sampling frequency Sky Survey Field 5000 (square degree) Sampling Frequence 15s observation stars 1.58million generated data 2.5TB/day Service life 10 years Total data 8PB
4 .Real-time Analysis Online Data Filter Analysis Organization
5 .Data Modeling t1 t2 t3 ...... tn Camera Camera Array id 1 CCD1 id 2 Key1 Value Data format CCD2 id 3 Key2 Value CCD3 ...... Key3 Value CCD4 Key4 Value id n
6 .Data Modeling t1 t2 t3 ...... tn Camera Camera Array id 1 CCD1 id 2 Key1 Value Data format CCD2 id 3 Key2 Value CCD3 ...... Key3 Value CCD4 Key4 Value id n
7 .Filter • Compression - high consume • Filter-1 Ø filtered tuple • Filter-2 ØStorage transient source(original data filtered) t1 t2 t3 ...... tn id x y t d1 d2 d3 ... d m id i1d n c≦m id x y t d1 ... dc
8 .Filter • Compression - high consume • Filter-1 Ø filtered tuple • Filter-2 ØStorage transient source(original data filtered) t1 t2 t3 ...... tn id x y t d1 d2 d3 ... d m id i1d n c≦m id x y t d1 ... dc
9 .Data Organization • Question: How to find all transient source in a period time? • SEPI index(Single Endpint Index) Ø inverted index < oid|stime, etime > Ø high update&distributed capacity t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
10 .Data Organization • Question: How to find all transient source in a period time? • SEPI index(Single Endpint Index) Ø inverted index < oid|stime, etime > Ø high update&distributed capacity t1 t2 t3 t4 t5 t6 t7 t8 t9 t10
11 .Online Analysis • Question: How many transient source exist in a period time? 2 Events +1 +2 +0 +1 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 Count: 2+1+2+0+1=6
12 .Experiment • Storage time: Ø0.81s(working duration time: 4.16h) • Analysis time: ØInterval query=2.5s ØCount analysis=0.112s sec running times
13 .Next Steps GWAC Now Sampling Frequence 15s observation stars 1.6millions • Future requirement generated data 2.5TB/day Øtime scale< 1s Service life 10 years Total data 8PB • Future Work Øusing new hardware to accelerate storage processing Øquery rewriting Øadaptive compression with high performance ØGPU processing
14 .Next Steps GWAC Now Future Sampling Frequence 15s 1s observation stars 1.6millions 24millions • Future requirement generated data 2.5TB/day 37.5TB/day Øtime scale< 1s Service life 10 years 10 years Total data 8PB 120PB • Future Work Øusing new hardware to accelerate storage processing Øquery rewriting Øadaptive compression with high performance ØGPU processing
15 .Thank You! weiren@ruc.edu.cn http://idke.ruc.edu.cn