- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Intel® Scalable I/O Virtualization
展开查看详情
1 .Intel® Scalable I/O Virtualization Kevin Tian Principal Engineer, Intel
2 .Legal Disclaimer No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting www.intel.com/design/literature.htm. Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others © Intel Corporation.
3 .Hardware-Assisted I/O Virtualization • Pursued for two classes of devices – High-performance devices where SW method imposes large overhead • E.g. NICs, RDMA devices, NVMe, etc. – Complex devices where virtualizing the device entirely in software is not practical • E.g. GPU, FPGA, etc. • Today SR-IOV is the standard framework for PCI Express® devices
4 .PCI Express® SR-IOV VM Container ■PCIe® Single Root I/O Virtualization (SR-IOV) PF VF1 VFn Physical Function (PF) PF BAR VF BAR VF BAR Virtual Function (VF) PF Config VF Config VF Config … …Q ■VF directly assignable to Q Q … Q Q Q …Q Q Q Backend Traditional Virtual Machine (VM) Resources Bare metal container/process Device VM container
5 .New Requirements • Hyper-scale environment – Scale to 1000+ VMs/containers • Dynamic resource management – User-defined sharing granularity, over-provisioning, etc. • Composability – VM live migration, snapshot, generational compatibility, etc. Observed major limitations on SR-IOV!
6 . Intel® Scalable I/O Virtualization (Intel® Scalable IOV) • A hardware-assisted mediated pass-through architecture – Slow-path operations emulated by software – Fast-path resources dynamically provisioned for direct access – Hardware-enforced DMA isolation between fast-path resources • Finer-grained device sharing than SR-IOV – Think about each TX/RX queue pair is now assignable • Utilizes existing PCIe® capabilities – e.g. Process Address Space ID (PASID) • Supports any type of devices – e.g. NIC, storage, GPU, accelerators, … (integrated or discrete) • Supports both VM and bare-metal usages
7 .Intel® Scalable IOV Concept VM ■Device: Assignable Device Interfaces (ADI) Queues, queue pairs, contexts VDEV VDEV … VDEV Meet isolation criteria to be ‘assignable’ Tagged with unique PASID Software Resource Remapping Logic … ■Platform: PASID-granular DMA PF BAR ADI ADI ADI isolation PF Config Through Intel® VT-d extensions Q Q … Q Q Q …Q Q Q … Q PASID PASID Device ■Software: Compose ADIs into DMA (BDF:PASID) Virtual Device (VDEV) Software managed resource remapping between VDEV and ADI Slow-path emulation & fast-path pass- IOMMU through
8 .Benefits Scalability Flexibility VM1 VM2 … VMn Process Process VM Container VDEV1 VDEV2 … VDEVn syscall VDEV1 VDEV2 Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Device Device Over-provisioning Composability VDEV1 VDEV2 VM VM Live Migration VDEV VDEV Q Q Q Q Q Q Q Q Device Q Q Q Q Q Q Q Device Device
9 .Assignable Device Interfaces (ADIs) • Smallest granularity of sharing a device – No PCI config space register, share common BDF – Identified by PASID • For ADI to be ‘assignable’ – Functional isolation between ADIs – ADI MMIO registers in separate system page size regions – All DMAs tagged with PASID – Independently resettable – Scalable Interrupt Message Storage (IMS) – …
10 .Enumeration of Intel® Scalable IOV Capability • Designated Vendor Specific Extended Capability (DVSEC) to discover Intel® Scalable IOV capability – A simplified subset of SR-IOV capability Byte Offset 31 24 23 20 19 16 15 0 Cap PCI Express Extended Capability ID Next Capability Offset 00h Version = 0x23 DVSEC DVSEC Length = 0x18 DVSEC Vendor ID = 8086 04h rev = 0 Function Dependency Flags (RO) DVSEC ID for Scalable IOV = XXX 08h Link (RO) Supported Page Sizes (RO) 0Ch System Page Size (RW) 10h Capabilities (RO) 14h
11 .Intel® VT-d Enhancement • Scalable mode DMA remapping – PASID granule 1st-level, 2nd-level, nested and pass-through – PASID table now two-level structure – Cover both Scalable IOV and SVM usages • Extended Context (ECS) is deprecated • Access/Dirty (A/D) bits in 2nd-level – Assist dirty memory tracking in live migration
12 .Extended Context Mode (Deprecated)
13 .Scalable Mode (New) Key Difference: PASID is a global ID space shared by all VMs. ALL page-table pointers moved to PASID Granular table
14 .Software Composition • Virtual Device Composition Module (VDCM) – Compose ADIs into Virtual Device (VDEV) – Emulate slow-path operations • Need a framework to connect VDCM for – Managing VDEV life-cycle – Setting up access policy on VDEV resources – Serving slow-path operations from guest • In Linux it’s VFIO mediated device framework! – “mdev” == “VDEV” in concept
15 .VFIO Mediated Device Framework Life-cycle Resource Run-time mgmt. enumeration emulation VFIO ■Mdev core User Interfaces Connect VFIO and VDCM Mdev Core Bus Driver ■User interfaces platform pci mdev Interface Used by libvirt, qemu, etc. Mdev Bus ? … ■IOMMU map/unmap Device Driver Interface IOMMU Interfaces ■DMA isolation for mdev Map/unmap callbacks Purely in software, or In vendor specific way IOMMU Host Driver Driver VDCM
16 .Extensions for Intel® Scalable IOV Finer- Life-cycle Resource Run-time grained mgmt. enumeration emulation ■IOMMU-capable mdev Link to iommu_domain (tagged VFIO by PASID) User Interfaces Allow PASID-granular iommu map/unmap Mdev Core Opt-in by VDCM Bus Driver platform pci mdev Interface ■Finer-grained resource Mdev Bus management IOMMU- … Specify any number of ADIs to capable compose a mdev Device Driver IOMMU Interfaces Interface ■Unified framework for Map/unmap per PASID callbacks VM and bare metal usages Scalable mode Mdev composition can be Host Driver usage specific, e.g. no PCI IOMMU VDCM emulation in bare metal usage Driver
17 .Main Linux Enabling Tasks • To enable basic ADI assignment – Support new scalable mode – Need system-wide PASID space – Introduce iommu-capable mdev – Device specific VDCM in host driver • To support vIOMMU/vSVM with ADI – Emulate new scalable mode on vIOMMU – Enlightened PASID management scheme – Maintain compatible APIs between PF/VF and ADI
18 .Summary of Architecture Changes • Support Assignable Device Interfaces (ADIs) Device Support • Support direct fast-path access from VMs • Extend Intel® VT-d to use PASID/BDF to Platform Support identify DMA upstream accesses • Move infrequent (slow-path) accesses from Software Support the device to software without affecting perf
19 .Documentation • Intel® VT-d specification update (Rev 3.0) – Documents Intel® VT-d (IOMMU) support for PASID granular address translation • Intel® Scalable I/O Virtualization Technical Specification (Rev 1.0) – Documents the Scalable IOV architecture blueprint and operation, including DVSEC – Addresses architecture requirements for devices and drivers – Agnostic of type of device or specific implementation – Openly published to enable broad device and software ecosystem • https://software.intel.com/en-us/articles/intel-sdm
20 .Q/A
21 .