- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
SparkML: Easy ML Productization for Real-Time Bidding
展开查看详情
1 .WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
2 .SparkML: Easy ML Productization for Real- Time Bidding Maximo Gurmendez Javier Buquet #UnifiedAnalytics #SparkAISummit
3 .Boston Company Builds software for marketers to run effective programmatic marketing campaigns Automated decisioning at the core
4 .Real Time Ad Bidding ad auction bidder X bidder Y $3 $2 $1
5 .dataxu: make marketing smarter through data science! Event data: Bids $3 Wins Losses ML System Attributions Bidding models
6 .Scale? Ø 2 Petabytes Processed Daily Ø 3 Million Bid Decisions Per Second Ø Runs 24 X 7 on 5 Continents Ø Thousands of ML Models Trained per Day
7 .Goals of dataxu’s ML System Highly Fast to Bid Optimal use of training Predictive (< 1 millisecond) resources No Always fresh Unattended downtime models operation Easy to deploy Self tuning Transparent new algorithms
8 .9 years ago f(x) Custom Hadoop f(x) Jobs (single pass) f(x) f(x) Campaign Models used at events bid time for each training data campaign
9 .4 years ago: Can we use Spark? Does it use Is it fast Thread too much enough? memory? safe? Can we use Spark Is it its out-the- models work expensive box ML well with our to train? algorithms? data?
10 .Problem #1: Data Partitioning beware of the fat reducers! 1 sample pass + 1 write pass
11 .Problem #2: Spark models not ready for a low latency bidding setting Feature Feature Feature Feature Prediction 1 2 1 2 1 0 Spark Model 1 0 0.3 1 1 1 1 0.7 0 1 0 1 0.4 At bid time things are different… Feature Feature Feature Feature Prediction 1 2 Model Needed 1 2 1 0 1 0 0.3 Solution: Extended Spark with RowModels
12 .Problem #2: Spark models not ready for a low latency bidding setting Solution: Extended Spark with RowModels
13 . Problem #3: Categorical Features Encoding Slow Spark Typical: F1 F2 F1 F2 IX 1 F1 F2 IX 1 IX 2 A X A Y StringIndexer A X 0 StringIndexer A X 0 1 B Y A Y 0 A Y 0 0 B Y 1 B Y 1 0 Instead: F1 F2 F1 F2 IX 1 IX 2 Metwally, Agrawal, and Amr A X A X 0 1 Abbadi (Efficient computation of A Y MultiTopK frequent and top-k elements in A Y 0 0 B Y data streams) B Y 1 0
14 .Problem #4: Expensive to train We were running one campaign at a time… Observations: • Some campaigns took hours, some a few minutes • Some parts of training were IO bound, some CPU bound • We observed cluster idleness between jobs Solutions: • Launch in parallel smart batches of jobs • Carefully overbook the cluster resources, and not use “maxResourceAllocation” Result: 60% cheaper than legacy 1-pass Hadoop method!
15 .Problem #5: How to switch systems? Stage 1: Decorated Model Spark model pulsed on that day Active Bidding Model Decorated Spark Bidding Model A/B tests
16 .Problem #5: How to switch systems? Stage 2: Selected Bidding Machine Stage 3: Full Switch
17 .Problem #5: How to switch systems? Everything went smoothly? Not exactly! • Reached S3 request limits upon deploy! • Rolled back • Implemented retries • Random waits • Back-offs & jitter • Latencies not exposed in simulations • Rolled back • Deeper profiling with YourKit
18 .What about self-tuning, unattended operations? event data Bidding machines model trainer insights manifest selector & calibrator builder builder calibrations bidding models insights manifests Blackboard (S3)
19 . What about transparency? { "model": { "partition": "Xm9ZgQEjav", "pipeline": "prospecting_random_forest", "uri": "s3://ml-bucket/../20180923.204250/" }, "bid_modifiers": [ { "name": "prospecting_random_forest", "parameters": { "profile": "quality_calibration" }, "type": "calibration", "uri": "s3://.../calibration.cjson" }, { "name": "insights-aware-bidding", "type": "insights-aware-bidding", "uri": ”s3://insights/../261716353" } ] }
20 .Easy to add new algorithms? Took 2 days to port a standard Spark ML pipeline for a customer into production, thanks to the blackboard design.
21 .DEMO #UnifiedAnalytics #SparkAISummit 21
22 .Outcomes Benefits Lessons Greater flexibility to adapt to new use cases Spark can be used for serious production systems Better overall performance Some tweaks are needed but still have the Better reliability and upgrade path benefits of the 3rd Party ML libraries 50% less code There’s no test like a full live test! 60% savings Gradual switchover, pulsing and vigilance protected our business from harm.
23 .Thank You! mgurmendez@dataxu.com jbuquet@dataxu.com DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT