- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
03_Going Beyond The Browser
展开查看详情
1 . @ensembl @ensemblgenomes @drdanstaines Ensembl Data: Going Beyond The Browser Dan Staines & Andy Yates Genomics Technology Infrastructure EMBL-EBI
2 .Ensembl has lots of data types...
3 ....and complex data in many dimensions... • ~45k genomes • ~450 Gbp of sequence • ~175 million genes • ~170 million proteins • ~1 billion protein features • ~1.5 billion cross-references • ~500 million homologous pairs • ~800 million variants • ~200 billion genotypes
4 ....and complex data in many dimensions... • ~45k genomes 75kBills • ~450 Gbp of sequence • ~175 million genes • ~170 million proteins 90kPotters • ~1 billion protein features • ~1.5 billion cross-references • ~500 million homologous pairs • ~800 million variants • ~200 billion genotypes
5 ....and can grow rapidly... 150m Number of 100m genes 500m 0 2009 2013 2017
6 .Accessing Ensembl data
7 .What we’re building {REST} query manager expression genes variation
8 .The Cambrian Explosion of Databases
9 .The Cambrian Explosion of Databases
10 .Why Elastic? • Handles our complex, nested data structures • Scales horizontally • Meets our performance needs: • Complex query on >100 million genes: ~500ms • Retrieval of all the genes from the human genome: <2 minutes
11 .Gene search 30k genomes 782Gb indices 110m genes 3.2bn documents 8 data nodes (2core/32G/200G) 571 Gb JSON dumps
12 .REST API query={"genome":"homo_sapiens", "name":"BRCA2"} fields=["name","description"] /query /fetch query manager expression genes variation
13 .Where are we now? • REST interface • Beta by invitation (helpdesk@ensembl.org) • Web interface • Initial prototyping to support endpoint development • Full development now underway
14 .Acknowledgements • Genomics Technology Infrastructure • Andy Yates • Ensembl Genomes • Mike Smith • Paul Kersey • Wolfgang Huber • Molecular Archives • Laura Clarke • Peter Harrison • Genome Analysis • Magali Ruffier • Jim Proctor • European Variation Archive • Mungo Carstairs • Cristina Yenyxe Gonzalez • Gene Expression Atlas • Irene Papatheodorou • Alfonso Munoz-Pomer Fuentes