- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
乔禹 - Track 1: In-memory indices with FAISS as the baseline 获奖方案展示
展开查看详情
1 .● NeurIPS 2021 Billion-Scale ANN Search Challenge:Track 1 乔禹 多模态向量检索⼯程师 2022年1⽉19⽇
2 .Billion-Scale ANN Search Challenge ■ 数据集:6个⼗亿规模数据集; ■ 检索类型:KNN和RNN; ■ 3个Track • T1:64G内存,IVFPQ • T2:64G内存+1T SSD,DiskANN • T3:⾃定义硬件 ■ Metrics • the sum of improvements in recall over the baseline at the target QPS over all datasets
3 .T1 solution-IVFPQ index based ■ 64G men on cpu:64G->PQ类,CPU->IVF类 ➔ IVFPQ类; ■ Baseline:IVF with 1M中⼼点 + 64位PQ/OPQ ■ IVF优化 / PQ优化 / 整体优化 ■ IVF优化:IVF_HNSW
4 .T1 solution-PQ index based ■ IVF优化 :IVF层的计算效率越⾼,nprobe可以设置的越⼤ ■ HNSW->INT8-HNSW:精度 -1%~5%,性能+10%-20% ■ PQ优化: 学习数据分布,semi-end-to-end training method https://arxiv.org/pdf/2108.00644.pdf
5 .T1 solution-PQ index based ■ 全局优化 ■ Avx512(Advanced vector extension)指令集 计算加速 ■ Query Batch和PQ 位宽的trade offer: PQ量化位宽越⼤,内存耗费越⾼精度越⾼,减⼩query batch
6 .Final results msspacev-1 Name bigann-1B deep-1B msturing-1B B kst_ann_t1 0.71219 0.71219 0.764542 0.756419 baseline 0.63451 0.65028 0.728861 0.703611 ● tcmalloc ● Intel MKl ● int8 based graph search ● pay more attention to data distribution
7 .谢谢