- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
构建大型可伸缩的机器学习算法框架
展开查看详情
1 .Building parallel machine learning algorithms: scaling out and up William Benton willb@redhat.com • @willb
2 .Motivation
3 .Motivation
4 .Motivation
5 .Motivation
6 .Forecast Introducing our case study: self-organizing maps Parallel implementations for partitioned collections (in particular, RDDs) Beyond the RDD: data frames and ML pipelines Practical considerations and key takeaways
7 .Introducing our case study
8 .
9 .
10 .Training self-organizing maps
11 .Training self-organizing maps
12 .Training self-organizing maps
13 .Training self-organizing maps
14 .Training self-organizing maps while t < maxupdates: random.shuffle(examples) for ex in examples: t = t + 1 if t == maxupdates: break bestMatch = closest(somt, ex) for (unit, wt) in neighborhood(bestMatch, sigma(t)): somt+1[unit] = somt[unit] + (ex - somt[unit]) * alpha(t) * wt
15 .Training self-organizing maps process the training while t < maxupdates: set in random order random.shuffle(examples) for ex in examples: t = t + 1 if t == maxupdates: break bestMatch = closest(somt, ex) for (unit, wt) in neighborhood(bestMatch, sigma(t)): somt+1[unit] = somt[unit] + (ex - somt[unit]) * alpha(t) * wt
16 .Training self-organizing maps process the training while t < maxupdates: set in random order random.shuffle(examples) for ex in examples: the neighborhood size controls t = t + 1 how much of the map around if t == maxupdates: the BMU is affected break bestMatch = closest(somt, ex) for (unit, wt) in neighborhood(bestMatch, sigma(t)): somt+1[unit] = somt[unit] + (ex - somt[unit]) * alpha(t) * wt
17 .Training self-organizing maps process the training while t < maxupdates: set in random order the learning rate controls random.shuffle(examples) how much closer to the for ex in examples: example each unit gets the neighborhood size controls t = t + 1 how much of the map around if t == maxupdates: the BMU is affected break bestMatch = closest(somt, ex) for (unit, wt) in neighborhood(bestMatch, sigma(t)): somt+1[unit] = somt[unit] + (ex - somt[unit]) * alpha(t) * wt
18 .Parallel implementations for partitioned collections
19 .Historical aside: Amdahl’s Law 1 lim So = sp —> ∞ 1 - p
20 .What forces serial execution?
21 .What forces serial execution?
22 .What forces serial execution? state[t+1] = combine(state[t], x)
23 .What forces serial execution? state[t+1] = combine(state[t], x)
24 .What forces serial execution? f1: (T, T) => T f2: (T, U) => T
25 .What forces serial execution? f1: (T, T) => T f2: (T, U) => T
26 .What forces serial execution? f1: (T, T) => T f2: (T, U) => T
27 .How can we fix these? a⊕b=b⊕a (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c)
28 .How can we fix these? a⊕b=b⊕a (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c)
29 .How can we fix these? a⊕b=b⊕a (a ⊕ b) ⊕ c = a ⊕ (b ⊕ c)