- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
大型矩阵分析与推理
展开查看详情
1 . Large Scale Matrix Analysis and Inference Wouter M. Koolen - Manfred Warmuth Reza Bosagh Zadeh - Gunnar Carlsson - Michael Mahoney Dec 9, NIPS 2013 1 / 32
2 .Introductory musing — What is a matrix? ai,j 1 A vector of n2 parameters 2 A covariance 3 A generalized probability distribution 4 ... 2 / 32
3 .1. A vector of n2 parameters When you regularize with the squared Frobenius norm min ||W||2F + loss(tr(WXn )) W n 3 / 32
4 .1. A vector of n2 parameters When you regularize with the squared Frobenius norm min ||W||2F + loss(tr(WXn )) W n Equivalent to min ||vec(W)||22 + loss(vec(W) · vec(Xn )) vec(W) n No structure: n2 independent variables 4 / 32
5 .2. A covariance View the symmetric positive definite matrix C as a covariance matrix of some random feature vector c ∈ Rn , i.e. C = E (c − E(c))(c − E(c)) n features plus their pairwise interactions 5 / 32
6 .Symmetric matrices as ellipses Ellipse = {Cu : u 2 = 1} Dotted lines connect point u on unit ball with point Cu on ellipse 6 / 32
7 .Symmetric matrices as ellipses Eigenvectors form axes Eigenvalues are lengths 7 / 32
8 .Dyads uu , where u unit vector One eigenvalue one All others zero Rank one projection matrix 8 / 32
9 .Directional variance along direction u V(c u) = u Cu = tr(C uu ) ≥ 0 The outer figure eight is direction u times the variance u C u PCA: find direction of largest variance 9 / 32
10 .3 dimensional variance plots tr(C uu ) is generalized probability when tr(C) = 1 10 / 32
11 .3. Generalized probability distributions Probability vector ω = (.2, .1., .6, .1) = i ωi ei mixture coefficients pure events Density matrix W= i ωi wi wi mixture coefficients pure density matrices 11 / 32
12 .3. Generalized probability distributions Probability vector ω = (.2, .1., .6, .1) = i ωi ei mixture coefficients pure events Density matrix W= i ωi wi wi mixture coefficients pure density matrices Matrices as generalized distributions 12 / 32
13 .3. Generalized probability distributions Probability vector ω = (.2, .1., .6, .1) = i ωi ei mixture coefficients pure events Density matrix W= i ωi wi wi mixture coefficients pure density matrices Matrices as generalized distributions Many mixtures lead to same density matrix There always exists a decomposition into n eigendyads Density matrix: Symmetric positive matrix of trace one 13 / 32
14 .It’s like a probability! Total variance along orthogonal set of directions is 1 u1 Wu1 + u2 Wu2 = 1 a+b+c =1 14 / 32
15 .Uniform density? 1 1 All dyads have generalized probability nI n 1 1 1 tr( I uu ) = tr(uu ) = n n n Generalized probabilities of n orthogonal dyads sum to 1 15 / 32
16 .Conventional Bayes Rule P(Mi )P(y |Mi ) P(Mi |y ) = P(y ) 4 updates with the same data likelihood Update maintains uncertainty information about maximum likelihood Soft max 16 / 32
17 .Conventional Bayes Rule P(Mi )P(y |Mi ) P(Mi |y ) = P(y ) 4 updates with the same data likelihood Update maintains uncertainty information about maximum likelihood Soft max 17 / 32
18 .Conventional Bayes Rule P(Mi )P(y |Mi ) P(Mi |y ) = P(y ) 4 updates with the same data likelihood Update maintains uncertainty information about maximum likelihood Soft max 18 / 32
19 .Conventional Bayes Rule P(Mi )P(y |Mi ) P(Mi |y ) = P(y ) 4 updates with the same data likelihood Update maintains uncertainty information about maximum likelihood Soft max 19 / 32
20 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 1 update with data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 20 / 32
21 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 2 updates with same data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 21 / 32
22 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 3 updates with same data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 22 / 32
23 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 4 updates with same data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 23 / 32
24 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 10 updates with same data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 24 / 32
25 .Bayes Rule for density matrices exp (log D(M) + log D(y|M)) D(M|y) = tr (above matrix) 20 updates with same data likelyhood matrix D(y|M) Update maintains uncertainty information about maximum eigenvalue Soft max eigenvalue calculation 25 / 32
26 .Bayes’ rules vector matrix P(Mi )·P(y |Mi ) D(M) D(y|M) Bayes rule P(Mi |y ) = D(M|y) = j P(Mj )·P(y |Mj ) tr(D(M) D(y|M) A B := exp(log A + log B) 26 / 32
27 .Bayes’ rules vector matrix P(Mi )·P(y |Mi ) D(M) D(y|M) Bayes rule P(Mi |y ) = D(M|y) = j P(Mj )·P(y |Mj ) tr(D(M) D(y|M) A B := exp(log A + log B) Regularizer Entropy Quantum Entropy 27 / 32
28 .Vector case as special case of matrix case Vectors as diagonal matrices All matrices same eigensystem Fancy becomes · Often the hardest problem ie bounds for the vector case “lift” to the matrix case 28 / 32
29 .Vector case as special case of matrix case Vectors as diagonal matrices All matrices same eigensystem Fancy becomes · Often the hardest problem ie bounds for the vector case “lift” to the matrix case This phenomenon has been dubbed the “free matrix lunch” Size of matrix = size of vector = n 29 / 32