- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Introduction to Milvus
- What is unstructured data? And why does it matter?
- The architecture of UDS (unstructured data service) for AI.
- How Milvus works?
- The application scenarios of Milvus.
- Community status.
- Join us!
展开查看详情
1 . Milvus Unstructured Data Search Engine in AI Era 11.2020 © 2020 Zilliz. All rights reserved.
2 .Unlock the treasure of unstructured data AI algorithms transform image, video, voice, natural language into vectors, and enables understanding and utilization of unstructured data at scale. Deep learning Embedding Knowledge, insight, Unstructured data models vectors $ © 2020 Zilliz. All rights reserved.
3 .The flow-based AI applications Video Extract Voice Extract The most popular way frames model tags • Flexible • Easy to compose, web-based UI Image • Sample pipelines The challenge Visual model VGG, eg. • Data fragmentation Embeddings Embeddings Attributes Visual Voice © 2020 Zilliz. All rights reserved. The sample pipelines for video processing
4 .The unstructured data service (UDS) for AI Unstructured Data image, video, voice, natural language Search Insert Model Inference Runtime store Inference Layer TensorRT, ONNX RT, TFRT, etc. Search Insert Milvus Object URI Object Vectors Attributes Data Service Layer High dense + Sparse experimental Vector ID Storage Multimodal Scoring on roadmap on roadmap output Result Set image, video, voice, natural language © 2020 Zilliz. All rights reserved.
5 .Why Milvus: Vectors are different Numbers Vectors Arithmetic operation Similarity (eg. Euclidean distance) Operation Number comparison Similarity comparison 1–10 Organizatio 1–5 6–10 n 1 2 3 4 5 6 7 8 9 10 © 2020 Zilliz. All rights reserved.
6 .Milvus: The big picture Query Scheduler Processing Engine Buffer Pool ANNS Collaborative Query Mi-FAISS, Mi-Annoy tag/structured data Index Result Files SDK / Web API top-K result Reducer Multi-modal Scoring app specific Segment Segment query obj Metadata Selection insert obj X86: supports SSE4.2, AVX2, AVX512 GPU: Pascal microarchitecture or later, CUDA 10.0 or later x86 ARM GPU New Index Arm: requires aarch64 Index Kunpeng: tested on Kunpen 920 with Centos 7.x Files Loongson: tested on Loongson with docker File container Kunpeng Loongson RISC-V RSIC-V: in early development Various Processors Storage Tier © 2020 Zilliz. All rights reserved.
7 .The ANN benchmark Milvus: 0.10.3 OS: Ubuntu 18.04 ECS: Azure D16s v3 (16c, 64GB), Intel Xeon Platinum 8171M Data set: sift-128-euclidean (1 million vectors) More info: https://milvus.io/docs/benchmarks_azure Special thanks to ANN-Benchmarks (developed by Martin Aumueller, Erik Bernhardsson and Alec Faitfull) © 2020 Zilliz. All rights reserved.
8 . Comprehensive Leading-Edge Dynamic Data Similarity Metrics Performance Management Real-world Cases Near Real Time Rich Data Type & Advanced Cost Efficient Search Search Highly Scalable and Robust Cloud Native Ease of Use © 2020 Zilliz. All rights reserved.
9 .Use case: Intelligent writing assistant Corpus Data natural language Writing Intention Data Cleansing Feature engineering Encoder TextCNN Extract paragraph, summary Result An auto-generated essay Encoder InferSent Object URI Object Milvus Vector ID Storage © 2020 Zilliz. All rights reserved.
10 .Use case: News recommendation on mobile Daily batch Feeding News title News title Encoder SimBert Object URI Object Milvus Vector ID Storage Reading Recommended Preference News © 2020 Zilliz. All rights reserved.
11 .Use case: Image search for company trademark Images Company Trademark • 55 million images • Search elapsed time: 20 ms on cloud GPU server Encoder VGG (fine tuned) Object URI Object Search Milvus Vector ID Storage Trademark Image Company Info © 2020 Zilliz. All rights reserved.
12 .Use case: Pharmaceutical molecule analysis Molecular Formula • 800 million molecules CC(=O)Nc1ccc(S(=O)(=O)NCC(=O)N2CCS(=O)CC2)cc1 • Search elapsed time: Encoder 500 ms on single server RDKit Molecular fingerprint: 1024 bits 00001100...10000000 Milvus Tanimoto similarity Molecular Substructure Candidate List Similarity © 2020 Zilliz. All rights reserved. Superstructure
13 . Comprehensive Leading-Edge Dynamic Data Similarity Metrics Performance Management How to Start Near Real Time Rich Data Type & Advanced Cost Efficient Search Search Highly Scalable and Robust Cloud Native Ease of Use © 2020 Zilliz. All rights reserved.
14 .The sample data The_Lord_of_the_Rings = [ { "title": "The_Fellowship_of_the_Ring", "id": 1, "duration": 208, "release_year": 2001, "embedding": [random.random() for _ in range(8)] }, ... ] © 2020 Zilliz. All rights reserved.
15 .Install the Milvus server $ docker pull milvusdb/milvus:0.11.0-cpu-d101620-4c44c0 $ mkdir -p /home/$USER/milvus/db $ mkdir -p /home/$USER/milvus/logs $ mkdir -p /home/$USER/milvus/wal $ mkdir -p /home/$USER/milvus/conf $ wget https://raw.githubusercontent.com/milvus-io/milvus/0.11.0/core/conf/demo/milvus.yaml $ sudo docker run -d --name milvus_cpu_0.11.0 \ -p 19530:19530 \ -p 19121:19121 \ -v /home/$USER/milvus/db:/var/lib/milvus/db \ -v /home/$USER/milvus/conf:/var/lib/milvus/conf \ -v /home/$USER/milvus/logs:/var/lib/milvus/logs \ -v /home/$USER/milvus/wal:/var/lib/milvus/wal \ milvusdb/milvus:0.11.0-cpu-d101620-4c44c0 © 2020 Zilliz. All rights reserved.
16 .Create a collection in Milvus $ pip3 install pymilvus==0.3.0 >>> client.create_collection(collection_name, collection_param) $ python >>> client.create_partition(collection_name, "American") >>> from milvus import Milvus, DataType >>> idx_param = {"index_type": "IVF_FLAT", "metric_type": "L2", "params": {"nlist": 4096}} >>> client = Milvus(_HOST, _PORT) >>> client.create_index(collection_name, "embedding", ivf_param) >>> collection_name = 'demo_films' >>> if collection_name in client.list_collections(): ... client.drop_collection(collection_name) >>> collection_param = { ... "fields": [ ... {"name": "duration", "type": DataType.INT32, "params": {"unit": "minute"}}, ... {"name": "release_year", "type": DataType.INT32}, ... {"name": "embedding", "type": DataType.FLOAT_VECTOR, "params": {"dim": 8}}, ... ], ... "segment_row_limit": 4096, ... "auto_id": False ... } © 2020 Zilliz. All rights reserved.
17 .Inject data >>> ids = [k.get("id") for k in The_Lord_of_the_Rings] >>> durations = [k.get("duration") for k in The_Lord_of_the_Rings] >>> release_years = [k.get("release_year") for k in The_Lord_of_the_Rings] >>> embeddings = [k.get("embedding") for k in The_Lord_of_the_Rings] >>> hybrid_entities = [ ... {"name": "duration", "values": durations, "type": DataType.INT32}, ... {"name": "release_year", "values": release_years, "type": DataType.INT32}, ... {"name": "embedding", "values": embeddings, "type": DataType.FLOAT_VECTOR}, ... ] >>> ids = client.insert(collection_name, hybrid_entities, ids, partition_tag="American") © 2020 Zilliz. All rights reserved.
18 .Run a search >>> query_hybrid = { ... "bool": { ... "must": [ ... { ... "term": {"release_year": [2002, 2003]} ... }, ... { ... "range": {"duration": {"GT": 250}} ... }, ... { ... "vector": { ... "embedding": {"topk": 3, "query": [query_embedding], "metric_type": "L2"} ... } ... } ... ] ... } ... } >>> results = client.search(collection_name, query_hybrid, fields=["duration", "release_year", "embedding"]) © 2020 Zilliz. All rights reserved.
19 . Comprehensive Leading-Edge Dynamic Data Similarity Metrics Performance Management The OSS Community Near Real Time Rich Data Type & Advanced Cost Efficient Search Search Highly Scalable and Robust Cloud Native Ease of Use © 2020 Zilliz. All rights reserved.
20 .Milvus: The journey 2018.10 2019.04 2019.06 The most active AI projects in 1st seed The idea Milvus 0.1 user Linux foundation 1st Open Joined Community Source LF AI Conference 2019.10 2020.03 2020.10 © 2020 Zilliz. All rights reserved.
21 .Progress (as of Nov. 2020) Unstoppable momentum since its debut. 6.0K 4.5K 121 Commits GitHub stars Contributors 16 400+ 19 Release Users Patents filed © 2020 Zilliz. All rights reserved.
22 .Zilliz: Who we are • Open-source software company based in Shanghai • Mission: Reinvent Data Science • Main contributor of Milvus project © 2020 Zilliz. All rights reserved.
23 .We are hiring~ Find our positions in China • C++ backend developer • AI algorithm engineer • Frontend developer • Product manager • Project manager • Cloud infrastructure engineer / developer Find our positions in US • Open-source evangelist (US) • Developer advocate (US) • Community manager (US) And a lot more… You may also contact hr@zilliz.com © 2020 Zilliz. All rights reserved.
24 .Resources Performance benchmark: https://milvus.io/docs/benchmarks_azure https://milvus.io Live demo: https://github.com/milvus-io/milvus https://milvus.io/scenarios • Content-based image retrieval system (以图搜图) https://twitter.com/milvusio • Q&A chatbot powered by NLP (智能客服机器人) • Molecular analysis (化合物分析) https://medium.com/unstructured-data-service https://zhuanlan.zhihu.com/ai-search Follow us on Wechat >>>>> © 2020 Zilliz. All rights reserved.
25 .Thanks! © 2020 Zilliz. All rights reserved.