- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Geospatial Analytics at Scale with Deep Learning and Apache Spark
展开查看详情
1 .Geospatial Analytics at Scale with Deep Learning and Apache Spark Raela Wang Databricks #UnifiedAnalytics #SparkAISummit
2 .About Me • Raela Wang • Solutions Architect @ Databricks • Specialist in Machine Learning solutions #UnifiedAnalytics #SparkAISummit 2
3 .What this talk covers - Image Processing with Apache Spark - Object Detection with Transfer Learning - Deep Learning Pipelines - Run Geolocalized queries to analyze results - Magellan - Demo 3
4 .Mapping the world • One of the most ancient big data activities in the world • Critical for navigation, warfare, commercial exploitation 4
5 .Mapping the world with images - A lot of tools and companies now provide geospatial solutions - Increasingly done with a combination of satellites, © Wired / Planet Labs airplanes and drones 5
6 .Mapping the world with images Large range of new applications - Disaster Recovery: flood survey, fallen trees - Infrastructure management: road damages - Economic Intelligence: roof inclination for solar panels 6
7 .New Challenges - Increasing amounts of Rich Data - Cost effective solutions for acquiring data at scale (Drones, CubeSats) - Difficulty Scaling - Traditional tools not designed for scalability: how to work at the scale of a country or a continent? - Pipelining Challenges - Geospatial combines a lot of tools and problems: alignment, image corrections, object detection, … All these technologies need to communicate data in a timely fashion 7
8 .8
9 .vehicle_classes = { 18:('car', 'red'), 23:('truck', 'orange'), 19:('bus', 'white', 0.0)} 9
10 .Apache Spark: the glue of big data - Technologies exist in isolation - OpenCV - Image manipulation - Tensorflow, Keras, PyTorch - Deep Learning - PostGIS, GeoMesa, Magellan - Geospatial Analytics - Leaflet.js/OpenStreetMap - Visualization - Apache Spark - ties all these libraries together - At scale - Allows pipelining - Easily move data from 1 technology to another without having to think about data representation 10
11 .High-level View of the Pipeline map data XML UDFs metadata Transfer Learning with Deep Learning Pipelines Analyze and Visualize Geospatial Analytics with Magellan 11
12 .Parsing Image Data map data XML UDFs metadata Transfer Learning with Deep Learning Pipelines Analyze and Visualize Geospatial Analytics with Magellan 12
13 .Ingesting Images Spark 2.3 -- ImageSchema to Read/Write Image data - Use the same schema across packages - Scikit-image, MMLSpark, OpenCV, PIL, Deep Learning Pipelines, ... images = spark.readImages(img_dir, recursive = True, sampleRatio = 0.1) 13
14 .Image Transformations with Spark - Spark Joins - Combine images with XML metadata - Spark UDFs - Eastings and Northings → Latitudes and Longitudes - Creating Image chips and respective coordinates 14
15 . (lat, long) (lat, long) 15
16 .Deep Learning map data XML UDFs metadata Transfer Learning with Deep Learning Pipelines Analyze and Visualize Geospatial Analytics with Magellan 16
17 .Success of Deep Learning • Tremendous success of image-based applications • Increased availability of pre-trained models • Quickly building domain-specific models using transfer learning •
18 .Existing frameworks ● Mostly Python ● Google's TensorFlow is the most popular (easy to install/use) ● PyTorch popular in research ● Others: MXNet, Theano, Caffe, Keras, DeepLearning4J (java)
19 . s Spark Deep Learning Pipelines: • Deep Learning Open-source with Databricks Simplicity library • Focuses on ease of use and integration • without sacrificing performance • Primary language: Python • Includes APIs to transform images
20 .Geospatial Analytics with Magellan map data XML UDFs metadata Transfer Learning with Deep Learning Pipelines Analyze and Visualize Geospatial Analytics with Magellan 20
21 .Common geospatial tasks ● Find all objects within an area ● Build geometries ● Cluster and aggregate similar objects ● Infer geometries (roads, buildings, etc.) 21
22 .Magellan • Open-source library for geospatial analytics with Spark • Understands various formats (geojson, …) • Performs basic geometric operations at scale (polygon intersection, joining, … ) • Integrates into Spark SQL engine and builds indices for high performance
23 .Demo #UnifiedAnalytics #SparkAISummit
24 .Recap 1) Read images with Spark 2) Parse image data with OpenCV and Spark UDFs a) Slice images into smaller image chips b) Generate respective coordinates for image chips 3) Pass data into a pre-trained tensorflow model and extract predictions with Spark Deep Learning Pipelines a) Model was trained on the xView dataset b) Model classifies objects identified in images 4) Visualize identified vehicles on a heatmap 5) Cross-check with Magellan 24
25 .DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT