Sketching as a Tool for Numerical Linear Algebra

下载 1

有只羊

发布于

2034

人观看

#信息技术

本章主要介绍精确回归算法(Exact Regression Algorithms)、草绘加速最小二乘回归(Sketching to speed up Least Squares Regression)、草绘以加速最小绝对偏差（l1）回归(Sketching to speed up Least Absolute Deviation (l1) Regression)、草绘加速低秩近似(Sketching to speed up Low Rank Approximation)几种处理方法。

展开查看详情

1 .Sketching as a Tool for Numerical Linear Algebra David Woodruff IBM Almaden

2 .Talk Outline  Exact Regression Algorithms  Sketching to speed up Least Squares Regression  Sketching to speed up Least Absolute Deviation (l1) Regression  Sketching to speed up Low Rank Approximation 2

3 .Regression Linear Regression  Statistical method to study linear dependencies between variables in the presence of noise. Example  Ohm's law V = R ∙ I  Find linear function that best fits the data 3

4 .Regression Standard Setting  One measured variable b  A set of predictor variables a1 ,…, a d  Assumption: b = x0 + a1 x1 + … + ad xd +   is assumed to be noise and the xi are model parameters we want to learn  Can assume x0 = 0  Now consider n observations of b 4

5 .Regression analysis Matrix form Input: nd-matrix A and a vector b=(b1,…, bn) n is the number of observations; d is the number of predictor variables Output: x* so that Ax* and b are close  Consider the over-constrained case, when n À d  Can assume that A has full column rank 5

6 .Regression analysis Least Squares Method  Find x* that minimizes |Ax-b|22 =  (bi – <Ai*, x>)²)²  Ai* is i-th row of A  Certain desirable statistical properties  Closed form solution: x = (ATA)-1 AT b Method of least absolute deviation (l1 -regression)  Find x* that minimizes |Ax-b|1 =  |bi – <Ai*, x>)²|  Cost is less sensitive to outliers than least squares  Can solve via linear programming Time complexities are at least n*d2, we want better! 6

7 .Talk Outline  Exact Regression Algorithms  Sketching to speed up Least Squares Regression  Sketching to speed up Least Absolute Deviation (l1) Regression  Sketching to speed up Low Rank Approximation 7

8 .Sketching to solve least squares regression  How to find an approximate solution x to minx |Ax-b|2 ?  Goal: output x‘ for which |Ax‘-b|2 · (1+ε) minx |Ax-b|2 with high probability  Draw S from a k x n random family of matrices, for a value k << n  Compute S*A and S*b  Output the solution x‘ to minx‘ |(SA)x-(Sb)|2 8

9 .How to choose the right sketching matrix S?  Recall: output the solution x‘ to minx‘ |(SA)x-(Sb)|2  Lots of matrices work  S is d/ε2 x n matrix of i.i.d. Normal random variables  Computing S*A may be slow… 9

10 .How to choose the right sketching matrix S? [S]  S is a Johnson Lindenstrauss Transform  S = P*H*D  D is a diagonal matrix with +1, -1 on diagonals  H is the Hadamard transform  P just chooses a random (small) subset of rows of H*D  S*A can be computed much faster 10

11 .Even faster sketching matrices [CW,MM,NN]  CountSketch matrix  Define k x n matrix S, for k = d2/ε2  S is really sparse: single randomly chosen non-zero entry per column [ 0010 01 00 1000 00 00 0 0 0 -1 1 0 -1 0 [ Surprisingly, this works! 0-1 0 0 0 0 0 1 11

12 .Talk Outline  Exact Regression Algorithms  Sketching to speed up Least Squares Regression  Sketching to speed up Least Absolute Deviation (l1) Regression  Sketching to speed up Low Rank Approximation 12

15 .Cauchy random variables  Cauchy random variables not as nice as Normal (Gaussian) random variables  They don’t have a mean and have infinite variance  Ratio of two independent Normal random variables is Cauchy 15

17 .Sketching to solve l1-regression  Main Idea: Compute a QR-factorization of S*A  Q has orthonormal columns and Q*R = S*A  A*R-1 turns out to be a “well-conditioning” of original matrix A  Compute A*R-1 and sample d3.5/ε2 rows of [A*R-1 , b’] where the i-th row is sampled proportional to its 1-norm  Solve regression problem on the (reweighted) samples 17

18 .Sketching to solve l1-regression [MM]  Most expensive operation is computing S*A where S is the matrix of i.i.d. Cauchy random variables  All other operations are in the “smaller space”  Can speed this up by choosing S as follows: [ [ 0010 01 00 1000 00 00 0 0 0 -1 1 0 -1 0 [ ¢ C1 C2 C3 [ 0-1 0 0 0 0 0 1 … Cn 18

19 .Further sketching improvements [WZ]  Can show you need a fewer number of sampled rows in later steps if instead choose S as follows  Instead of diagonal of Cauchy random variables, choose diagonal of reciprocals of exponential random variables [ [ 0010 01 00 1000 00 00 0 0 0 -1 1 0 -1 0 [ ¢ 1/E1 1/E2 1/E3 [ 0-1 0 0 0 0 0 1 … 1/En 19

20 .Talk Outline  Exact regression algorithms  Sketching to speed up Least Squares Regression  Sketching to speed up Least Absolute Deviation (l1) Regression  Sketching to speed up Low Rank Approximation 20

21 .Low rank approximation  A is an n x n matrix  Typically well-approximated by low rank matrix  E.g., only high rank because of noise  Want to output a rank k matrix A’, so that |A-A’|F · (1+ε) |A-Ak|F, w.h.p., where Ak = argminrank k matrices B |A-B|F  For matrix C, |C|F = (Σi,j Ci,j2)1/2 21

22 . Solution to low-rank approximation [S]  Given n x n input matrix A Most time- consuming  Compute S*A using a sketching matrix S with k << n rows. S*A takes random linear combinations of rows of A step is computing S*A A SScan canbe bematrix matrixof ofi.i.d. i.i.d. Normals Normals SScan canbe beaaFast FastJohnson Johnson SA Lindenstrauss LindenstraussMatrix Matrix  Project rows of A onto SA, then find best rank-k SScan can be be aaCountSketch CountSketch approximation to points inside of SA. matrix matrix 22

23 .Conclusion  Gave fast sketching-based algorithms for  Least Squares Regression  Least Absolute Deviation (l1) Regression  Low Rank Approximation  Sketching also provides “dimensionality reduction”  Communication-efficient solutions for these problems 23

0点赞

0收藏

1下载