- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Magic Quadrant for Data Science and Machine Learning Platforms
展开查看详情
1 . Licensed for Distribution Magic Quadrant for Data Science and Machine Learning Platforms Published 28 January 2019 - ID G00354456 - 79 min read By Analysts Carlie Idoine, Peter Krensky, Erick Brethenoux, Alexander Linden Expert data scientists, citizen data scientists and application developers require professional capabilities for building, deploying and managing analytical models. New vendors to this Magic Quadrant, along with changes in the positions of others, reflect a dynamic market that is evolving rapidly. Market Definition/Description This Magic Quadrant evaluates vendors of data science and machine learning (ML) platforms. These are software products that enable expert data scientists, citizen data scientists and application developers to create, deploy and manage their own advanced analytic models (see “Maximize the Value of Your Data Science Efforts by Empowering Citizen Data Scientists”). We define a data science platform as: A cohesive software application that offers a mixture of basic building blocks essential for creating all kinds of data science solution, and for incorporating those solutions into business processes, surrounding infrastructure and products. “Cohesive” means that the application’s basic building blocks are well-integrated into a single platform, that they provide a consistent “look and feel,” and that the modules are reasonably interoperable in support of an analytics pipeline. An application that is not cohesive — that mostly uses or bundles various packages and libraries — is not considered a data science and ML platform, according to our definition.
2 .A data science and ML platform supports various skilled data scientists in multiple tasks across the data and analytics pipeline. These range from data ingestion, data preparation, interactive exploration and visualization and feature engineering to advanced modeling, testing and deployment. Within the field of data science, ML is the most popular area. It is one that warrants specific attention by those evaluating these platforms. Not all organizations build all of their data science and ML models from scratch. Some may need assistance with getting started with or extending their data science and ML initiatives. Although this Magic Quadrant does assess the availability of prepackaged content, such as templates and samples, it does not assess service providers who can help jump-start or extend data science and ML application throughout an organization (as outlined in “Market Guide for Data Science and Machine Learning Service Providers”). Nor does this Magic Quadrant assess specialized vendors of industry-, domain- or function-specific solutions. Readers of this Magic Quadrant should understand that: ■ This market features a diverse range of vendors: Gartner invited a wide range of data science and ML platform vendors to participate in the evaluation process for potential inclusion in this Magic Quadrant. Users of these platforms, who include data scientists, citizen data scientists and application developers, have different requirements and preferences for user interfaces (UIs) and tools. Expert data scientists prefer to code data science models in Python or R, or to build and run data models in notebooks. Other users are most comfortable building models by using a point-and-click UI to create visual pipelines. Many members of emerging citizen data science communities favor a much more augmented approach that uses ML techniques “behind the scenes” to guide these less expert data scientists through the model building and operationalization process (see “Build a Comprehensive Ecosystem for Citizen Data Science to Drive Impactful Analytics”). Over time, expert data scientists may also come to prefer an augmented approach, which would enable them to navigate the model-building and operationalization process more efficiently. Tool and use case diversity is more important than ever. ■ A Leader may not be the best choice: The wide range of products available offers a breadth and depth of capability, and varied approaches to developing, operationalizing and managing models. It is therefore important to evaluate your specific needs when assessing vendors. A vendor in the Leaders quadrant, for example, might not be the best choice for you. Equally, a Niche Player might be the perfect choice. For an extensive review of the functional capabilities of each platform, see “Critical Capabilities for Data Science and Machine Learning Platforms.” Bear in mind that this Magic Quadrant includes only a small selection of the hundreds of vendors in this market.
3 .■ Only vendors with commercially licensable products are included: Pure open-source platforms are excluded from this Magic Quadrant. Only commercially licensed open-source platforms are included. We do, however, recognize the growing trend for commercial platforms to use open-source libraries and content. Vendors take different approaches to including and supporting open source. Open-source solutions represent an opportunity for both users and vendors to get started with data science and ML with little upfront investment (see Note 1). In addition, many users of data science and ML platforms are either already proficient in or can easily learn and apply open-source technologies. Leveraging open source through collaborative or orchestrated integration with commercial offerings also eliminates the need for a vendor to re-create specific capabilities within its own platform, as innovation is fast-paced within the open-source community. This approach enables a vendor to keep up with fast-changing algorithms and approaches, while focusing on capabilities that differentiate it from their competitors. However, a platform’s ease of use may suffer if its vendor does not account for the needs of all types of user. ■ Platforms must support not only model building but also model operationalization: The full benefit — including business value — of data science and ML will not be achieved unless models are both: 1. ■ Embedded in business processes 2. ■ Maintained, monitored and managed over time The Gartner Data Science Team Survey of January 2018 found that over 60% of models developed with the intention of operationalizing them were never actually operationalized (see “How to Operationalize Machine Learning and Data Science Projects”). There are many reasons for this, but a crucial one is a lack of tools to enable and facilitate operationalization, which is not just about deployment. Operationalization extends to ongoing review and adjustment of models to ensure their relevancy over time as the business and its objectives change. It also requires ongoing management of models across the organization. ■ Artificial intelligence (AI) is hyped: Hype about AI is at its peak, but AI must be distinguished from data science and ML. Of course, data science is a core discipline for the development of AI, and ML is a core enabler of AI, but this is not the whole story. ML is about creating and training models; AI is about using those models to infer conclusions under certain conditions. AI is on a different level of aggregation to data science and ML. AI is at the application level. Data science and ML models must be combined to work together with other capabilities, such as a UI and workflow management, to constitute an AI application. A self-driving car, for example, has ML capability, but its AI requires much more than that.
4 .The diversity of data science and ML platforms largely reflects the wide range of people that use them. This Magic Quadrant is therefore aimed at a variety of audiences: ■ Citizen data scientists: Increasingly, these are accessing data and building data science and ML models. They are people who need access to data science and ML capabilities, but who do not have the advanced skills of traditional expert data scientists. Citizen data scientists can come from roles such as business analyst, line of business (LOB) analyst, data engineer and application developer. They need to understand the nature of the data science and ML market, and how it differs from, but complements, the analytics and business intelligence (BI) market (see “Magic Quadrant for Analytics and Business Intelligence Platforms”). Citizen data scientists do not replace expert data scientists but, instead, work in collaboration with them. ■ Line of business (LOB) data science teams: Typically, these are sponsored by their LOB’s executive and charged with addressing LOB-led initiatives in areas such as marketing, risk management and CRM. They focus on their own and their department’s priorities. Levels of collaboration with other LOB data science teams vary. LOB data science teams can include both expert and citizen data scientists. ■ Corporate data science teams: These have strong and broad executive sponsorship, and can take a cross-functional perspective from a position of enterprisewide visibility. In addition to supporting model building, they are often charged with defining and supporting an end-to- end process for building and deploying data science and ML models. They often work in partnership with LOB data science teams in multitier organizations. In addition, they might provide LOB assistance for LOB teams that do not have their own data scientists. Corporate data science teams typically include expert data scientists. ■ “Maverick” data scientists: These are typically one-off scientists in various LOBs. They tend to work independently on “point” solutions and usually strongly favor open-source tools, such as Python, R and Apache Spark. They rarely collaborate much with other data scientists or departments within their organization. Magic Quadrant
5 . Figure 1. Magic Quadrant for Data Science and Machine Learning Platforms Source: Gartner (January 2019) Vendor Strengths and Cautions Alteryx Alteryx (https://www.alteryx.com/) is based in Irvine, California, U.S. It provides four software products, which comprise its data science platform. The Alteryx Analytics platform includes Alteryx Connect, Alteryx Designer, Alteryx Server and Alteryx Promote. Alteryx has turned from a Leader into a Challenger by maintaining its position for Ability to
6 .Execute but demonstrating less vision relative to many other vendors in this Magic Quadrant. Nevertheless, Alteryx’s emphasis on making data science accessible to citizen data scientists and others across the end-to-end analytic pipeline is resonating in the market. Its approach provides a natural extension for a client base focused on data preparation but ready to take the next step into data science. A lack of innovation, relative to others, also contributes to Alteryx’s new position as a Challenger. Strengths ■ Collaborative enablement of broad user base: Alteryx’s no-code approach is attractive to a broad spectrum of users, from business and data analysts to citizen data scientists. A focus on the ease of use and cohesiveness of its platform enables collaboration between users. ■ End-to-end pipeline: Alteryx has focused on offering a complete, end-to-end data science platform. It has added two new products to its platform. Alteryx Connect focuses on data connections, data discovery and social connections. Alteryx Promote incorporates Alteryx’s Yhat acquisition and focuses on operationalizing analytic content. ■ Marketing execution: Alteryx’s focus on addressing the end-to-end analytic process easily and clearly positions it as a vendor of a comprehensive platform. Alteryx’s value proposition is clear and resonating. ■ Strong customer experience: Alteryx scored in the top quartile for customer experience in our survey of reference customers. Scores were consistently high for overall customer experience, plans to make additional investments, inclusion of product enhancements and requested features into subsequent releases, and overall product capabilities. Cautions ■ Data preparation legacy and market perception: Although Alteryx has built a strong brand by providing self-service data preparation, many potential customers are unfamiliar with the model-building and operationalization capabilities provided. Despite progress, the company’s “legacy” reputation continues to obscure its full value proposition. ■ Innovation: Alteryx’s innovation scores were low, relative to other vendors in this Magic Quadrant. Alteryx is not a standout vendor in terms of automation and augmentation, deep learning or the Internet of Things (IoT). ■ Market understanding: Alteryx scored below the average for market understanding. It appeals primarily to citizen data scientists and entry-level data scientists, and does not cater significantly for cutting-edge or code-focused expert data scientists. Anaconda
7 .Anaconda (https://www.anaconda.com/) is based in Austin, Texas, U.S. It offers Anaconda Enterprise 5.2, a data science development environment based on the interactive notebook concept (this analysis excludes the Conda Distribution Packages) that sees users exploiting open-source Python and R-based packages. Anaconda continues to provide a loosely coupled distribution environment, which offers access to a wide range of open-source development environments and open-source libraries, primarily Python-based. Anaconda benefits from the growing popularity of Python, the newly preeminent language for data scientists. Anaconda remains a Niche Player. It still suffers from a disparity between its power to federate a very large number of Python developers, who are continuously building additional capabilities, and its lack of control over these developers’ efforts in terms of quality, dependability and predictability. Anaconda is well-suited to seasoned data scientists who are fluent in Python or R and eager to explore a continuous stream of capabilities in Anaconda Cloud, while still benefiting from an environment more structured than a pure notebook environment. Strengths ■ Python and open-source support: The dominance of Python among data scientists gives Anaconda great visibility to developers. Anaconda is the only data science vendor not just supporting but also indemnifying and securing the Python open-source community. In the past year, the company has revamped its user interface by providing enhanced collaboration and model reproducibility features, giving data scientists better productivity and model management capabilities. ■ Active ecosystem: Reference customers praised Anaconda’s extensive and active community engagement. The community fosters cutting-edge Python code libraries and integration with other open-source data science projects. Anaconda Cloud also provides wide means of collaboration and code library exchanges, for data scientists and developers to explore and accelerate model development production, whether in the cloud or on- premises. ■ Scalable development for open-source libraries: Anaconda’s scalability takes two main forms: capabilities relating to automatic GPU code production and the ability to embed its platform seamlessly within any of the large cloud providers. Cautions ■ Designed for experts: Anaconda targets experienced data scientists familiar with Python and notebooks. Many data scientists’ favorites, including the widely popular Jupyter notebooks, are readily able for use through Anaconda’s environment. But, however flexible they are, those environments are not conducive to fruitful discussions with business users — a capability to support such exchanges is increasingly valued by large organizations lacking
8 . data science talent. ■ Open-source shortcomings: Like many open-source promoters, Anaconda suffers from the usual drawbacks associated with large and flexible developer communities: backward compatibility issues between versions; lack of visibility into important upcoming capabilities (model operationalization, for example); lack of code optimization for models’ integration with existing applications; and, despite marked progress in terms of workbench homogeneity, a lack of overall coherence. ■ Automation and augmentation: Novice Anaconda users will have difficulty finding their way through the Python “jungle.” Citizen data scientists will find themselves in uncharted territory within Anaconda’s environment. Also, the do-it-yourself skills and attitude exhibited by typical Anaconda users are not suited to ML automation practices (such as AutoML’s automation of only part of the model development process), which are increasingly popular with data scientists. Databricks Databricks (https://databricks.com/) is based in San Francisco, U.S. Its Apache Spark-based Unified Analytics Platform combines data engineering and data science capabilities that use a variety of open-source languages. In addition to Spark, the platform provides proprietary features for security, reliability, operationalization, performance and real-time enablement on Amazon Web Services (AWS). Azure Databricks, which became generally available in March 2018, is an integrated service within Microsoft Azure that provides a high-performance Apache Spark-based platform optimized for Azure. Databricks remains a Visionary by providing support for the end-to-end analytic life cycle, hybrid cloud environments and accessibility for a wide variety of users. A focus on innovation and a consistently strong and comprehensive product offering have enabled Databricks to improve its position for both Ability to Execute and Completeness of Vision. Strengths ■ Innovation: Breadth and ease of open-source integration, streaming IoT capabilities and operationalization capabilities are key differentiators for Databricks. Its platform extends on open-source capabilities by providing the framework needed for end-to-end enterprise scalability, performance and operationalization. Databricks Delta, launched in October 2017, provides a managed cloud service for unified data management with support for streaming analytics and ML. MLflow, launched in June 2018, includes support for experimentation, reproducibility and deployment. The Databricks Runtime for Machine Learning provides preconfigured clusters for deep learning.
9 .■ Partnership with Microsoft: Azure Databricks has quickly gained traction within the Azure community. Azure Databricks adds global scale to Databricks’ effective marketing and sales strategy. It includes an interactive, collaborative workspace for collaboration between data scientists, data engineers and business analysts, single click-to-launch Spark environment capability, and integration with Azure components. ■ Customer appreciation: Surveyed reference customers scored Databricks in the top quartile for both customer experience and operations. Databricks received the highest overall scores for both overall vendor experience and quality of documentation. Cautions ■ Pricing and contract negotiation: Although Databricks reduces the Spark total cost of ownership (TCO) for comparable loads, Databricks’ reference customers scored it in the bottom quartile for evaluation and contract negotiation experience and for predictable, controllable pricing. Customers raised concerns about pricing and value received, relative to investment. ■ Troubleshooting and debugging: Databricks’ reference customers indicated difficulty with troubleshooting errors and debugging applications. They would like improved debugging aids, as well as assistance with navigating Spark errors. ■ Cost monitoring and management: Reference customers indicated a need for improved capabilities for monitoring and managing costs between different user groups and managing accounts. Databricks received the second-lowest overall score for predictable and controllable ongoing software cost. Dataiku Dataiku (https://www.dataiku.com/) is headquartered in New York City, U.S., and has a main office in Paris, France. It offers Data Science Studio (DSS) with a focus on cross-discipline collaboration and ease of use. Dataiku’s appearance in the Challengers quadrant is principally due to its strong execution and strengthening capabilities in relation to scalability. A focus on real-time analytics capabilities and expansion of the breadth of its use cases could move Dataiku into the Leaders quadrant. Ease of use and collaboration across data science roles and between data science teams remain two of its platform’s major assets. In 2018, Dataiku has executed very well against its vision and delivered capabilities that make DSS a more mature platform. Strengths ■ Collaboration across data science roles: From its inception, teamwork has been at the core
10 . of Dataiku’s DSS platform. From data engineers to data scientists, subject matter experts and citizen data scientists, all of the principal roles involved in developing ML models have their place on the platform. It offers various user interface endpoints, all pointing to the same execution core, which makes it one of the most coherent offerings in this Magic Quadrant. ■ Ease of use: The qualities of Dataiku most often highlighted by clients are that its platform is relatively easy to learn and that it provides a rapid path to productivity. Automated ML capabilities and the explicit possibility of quickly delivering interpretable models also contribute to the platform’s intuitive feel and short learning curve. This approachable quality does not prevent experts and experienced data scientists from using notebook-style development capabilities that can complement overall team productivity. ■ Operationalization capabilities: Operationalization is the area in which DSS has made the most progress in the past year. Dataiku now includes model management and monitoring as flexible deployment options. Its logical data science process pipeline view, and coherence around the various roles throughout that process, form a strong foundation for more comprehensive operationalization capabilities. Cautions ■ Scalability of models in production: Despite Dataiku’s newly improved model operationalization capabilities, some clients are still looking for a more seamless, less manual experience with better visualization functions for getting models into production. Reference customers reported that even the new model deployment capabilities do not always deliver as advertised. ■ Price and licensing: Dataiku’s customers often report a suboptimum sales experience, with some considering the company pricey, some regretting a lack of pricing visibility, and some frustrated by cumbersome pricing structures. Confusion about Dataiku’s pricing and licensing has been known to delay the sales cycle. ■ Streaming and the IoT: Being focused on ease of use and collaboration across roles, Dataiku’s DSS seems favored by service-centric organizations. This could also be explained by the fact that Dataiku does not offer strong IoT capabilities and functions, even if it has shown that its platform can operate in demanding and computationally intensive environments. DataRobot DataRobot (https://www.datarobot.com/) is based in Boston, Massachusetts, U.S. It provides an augmented data science and ML platform. The platform automates key tasks, enabling data scientists to work efficiently and citizen data scientists to build models easily.
11 .DataRobot is a new entrant to this Magic Quadrant. It debuts as a Visionary. Its Completeness of Vision is supported by strong marketing and sales strategies and innovation. Its Ability to Execute is limited by being one of the newer entrants to this market (it launched in 2013), but supported by positive customer feedback. Strengths ■ Thought leader in augmented data science: DataRobot sets the standard for augmented data science and ML. Significant funding has enabled expansion via acquisitions to address time series modeling (Nutonian in May 2017) and an augmented approach for developers to incorporate models into applications (Nexosis in July 2018). These acquisitions give DataRobot the opportunity to extend its capabilities to new types of user, while focusing on its core competency of augmentation. ■ Strong customer experience: Reference customers scored DataRobot in the top quartile for overall experience with a vendor and in the top half for both overall product capabilities and inclusion of product enhancements and requests into subsequent releases. DataRobot’s customer-facing data scientists, assigned to each client to jump-start initiatives, provide a unique approach to supporting and onboarding clients. ■ Market responsiveness: DataRobot’s market responsiveness is strong. Despite being a relative newcomer to the data science and ML market, DataRobot has a solid installed base. In addition, the company is quickly gaining market traction. Cautions ■ Sales execution and pricing: Feedback from reference customers about DataRobot’s pricing indicates that it is high, which makes it difficult to scaling the use of its software. They scored DataRobot in the lowest quartile for pricing and sales execution. Only 35% of surveyed respondents — the lowest percentage among participating vendors — strongly believe they will make additional investments in DataRobot’s software. ■ Commoditization of augmented analytics: Augmented analytics is becoming increasingly available in both data science and ML platforms and analytics and BI platforms. As these capabilities become commoditized, DataRobot will need to continue to differentiate itself. Open-source automated ML, though nascent, is also a long-term threat. ■ Growth and scalability challenges: DataRobot’s augmented capabilities are strong with regard to model creation, but do not extend across the analytic process to data preparation and operationalization. (Additional capabilities for model management were added in August 2018, after the cutoff date for consideration in this Magic Quadrant and, as such, were not evaluated.) Although customers like the promise of an augmented approach, the need for additional tools and skills for complete analyses is a concern. The market will soon
12 . demand end-to-end augmented capabilities. DataRobot’s concierge service featuring customer-facing data scientists is also difficult to scale. Datawatch (Angoss) Datawatch (https://www.datawatch.com/) is based in Bedford, Massachusetts, U.S. In January 2018, it acquired Angoss and its main data science product components. These include KnowledgeSEEKER, the most basic offering, aimed at citizen data scientists in a desktop context; KnowledgeSTUDIO, which includes many more models and capabilities than KnowledgeSEEKER; and KnowledgeENTERPRISE, a flagship product that includes the full range of capabilities. Angoss has over two decades’ experience, and has loyal customers in services-centric and financial services organizations in particular. Often praised for its ease of use and intuitive interface, Angoss should benefit from Datawatch’s extensive experience in data management and preparation. However, the inherent risk and integration uncertainties associated with every acquisition have impacted the company’s scores for Completeness of Vision and Ability to Execute. In theory, Angoss and Datawatch could emerge as a strong combination in this market, but the integration seems to still be a work in progress, which contributes to Datawatch’s position as a Niche Player. Strengths ■ Ease of use and product coherence: Customers continue to commend Angoss’ intuitive interface and well-rounded product functionality. The platform is well-suited to citizen data scientists looking for technological depth, reliability and a quick path to productivity. But the corollary of that strength is an above-average perceived TCO. ■ Customer support: Faithful to its long history, Angoss still strives to build strong customer relationships, as does its new parent. The company’s responsiveness and genuine interest in its clients’ business outcomes is the basis for its customers’ loyalty — something that “acquisition jitters” could compromise, if not handled properly. ■ Additional analytical functionality: Angoss has one of the few platforms that offer strong additional and well-integrated capabilities, such as an optimization engine (with KnowledgeOPTIMIZER) and text analytics capabilities (through KnowledgeREADER). Cautions ■ Acquisition uncertainties: Every acquisition — even those that are thoughtful and logical (from a technology perspective) — entails significant risks. One of the most important, in this case, could be the risk of losing critical data science talent that is difficult to replace.
13 . Another is that the acquisition might delay important roadmap elements that are anxiously expected by clients, such as model operationalization capabilities. Potential customers should seek reassurance about Angoss’ roadmap execution before investing in its platform. ■ Vision and innovation: Despite progress from a development perspective, Angoss is losing ground in relation to advances such as deep learning and augmented ML functionality. The company also needs to improve its support for, and integration of, open-source capabilities. ■ Data preparation: Another area where clients have been asking for better support and functionality is data preparation and management. Given Datawatch’s strong expertise, the Angoss data science platform should see significant improvement in 2019. Domino Domino (Domino Data Lab) (https://www.dominodatalab.com/) is headquartered in San Francisco, California, U.S. The Domino Data Science Platform represents a comprehensive end- to-end solution designed for expert data scientists. The platform incorporates both open- source and proprietary tool ecosystems, while providing capabilities for collaboration, reproducibility, and centralization of model development and deployment. Domino, which was founded in 2013, has moved from the Visionaries quadrant to the Niche Players quadrant. Its market presence continues to grow, but with a focus on the expert data science community. Low customer feedback scores and reliance on many components for comprehensive capabilities contribute to its new position in the Magic Quadrant. Strengths ■ Good product strategy: Reference customers scored Domino in the top quartile for product strategy. Product bundling and configuration is straightforward. The roadmap focuses on collaboration and accessibility, building a tool- and platform-agnostic ecosystem, and driving the end-to-end analytic process through to operationalization, with the goal of making data science a scalable enterprise capability. ■ Open-source and proprietary tool integration: Domino emphasizes the ability to use open- source and proprietary software within a consistent container. Breadth and ease of open- source integration are strengths. Strong open-source support enables data scientists to use their tools of choice within the platform. Again this year, Domino received the highest overall score for flexibility, extensibility and openness. ■ Collaboration and scalability: Domino’s reference customers especially praised the ease of collaboration in a shared environment for project teams using various tools, in conjunction with the ability to scale up computing resources as needed.
14 .Cautions ■ Narrow focus on expert data scientists: In a market that is changing quickly and increasingly focused on multiple types of user with varying skills levels, Domino’s approach of targeting expert data scientists by bringing multiple tools together in one environment reduces its platform’s reach. Additionally, its platform is less differentiated than in previous years. It caters to sophisticated audiences, but fails to connect with growing citizen user groups. ■ Poor operations support: Domino has faced challenges scaling its operation support. Relative to other vendors in this Magic Quadrant, reference customers put Domino in the lowest quartile for analytic support, including training and guidance with technique selection. They also placed Domino in the lower half for service and support and overall integration and deployment. ■ Lack of some capabilities: Domino lacks several key capabilities in terms of data access and data preparation, automation and augmentation, user interface for nonexperts and “precanned” solutions. These shortcomings make Domino’s tool hard to use for nonexpert data scientists who need a user-friendly, guided approach to data preparation and exploration, and to creating and operationalizing models. Google Google (https://www.google.com/) , a subsidiary of Alphabet, is based in Mountain View, California, U.S. Its core ML platform offerings include Cloud ML Engine, Cloud AutoML, the open-source TensorFlow, and the recently announced BigQuery ML. Its ML components require other Google components for end-to-end capabilities, such as Google Cloud Dataprep, Google Datalab, Google BigQuery, Google Cloud Dataflow, Google Cloud Dataproc, Google Data Studio, Kubeflow and Google Kubernetes Engine. Most of these components require the presence of the Google Cloud Platform (GCP). The following ML tools were not in general availability by the cutoff point for full inclusion in this evaluation: Cloud AutoML, Google Data Studio, Kubeflow Pipelines, AI Hub and BigQuery ML. As such, these tools could not be included in the Ability to Execute scoring, but did contribute to Google’s position for Completeness of Vision. Strengths ■ Breadth of offerings: Google offers a rich ecosystem of AI products and solutions, ranging from hardware (Tensor Processing Unit [TPU]) and crowdsourcing (Kaggle) to world-class ML components for processing unstructured data like images, video and text. Google is also one of the pioneers of automated ML (with Cloud AutoML). It excels even more with its industry-leading open-source TensorFlow offering for deep neural nets. ■ Scalability and speed: Most of Google’s ML components are meant to run at scale on its high-performing public could environment, GCP. It offers a fully managed environment in
15 . which ML can be implemented at scale, which is rivaled by few other companies at this point. Google’s TensorFlow and Kubernetes are the most popular choices for many AI startups and for cutting-edge university AI and data science teams. ■ Geared toward developers: With Cloud AutoML and most of its other tools, Google offers a high-quality software solution for non-data scientists, especially software developers. GCP users familiar with GCP tools will have increasingly less reason to look elsewhere for ML capabilities. Matters will improve even further with the addition of upcoming AutoML capabilities and on-premises functionality (such as those in the Kubernetes-based Kubeflow). Cautions ■ Lack of an end-to-end, coherent and easy-to-use ML offering: The third strength noted above also represents the biggest drawback for two large groups of data science constituents: core data scientists who lack familiarity with GCP and (even more so) business-oriented citizen data scientists. For those users, the learning curve for some Google tools can be steep. It is very fragmented and easily becomes overwhelming, especially for the many casual data scientists who spend only 15% to 20% of their time on core ML projects. ■ Currently limited on-premises capabilities: Users who require full on-premises capabilities can only use the open-sourced TensorFlow and Google’s Kubernetes-based Kubeflow. Yet most other development tools and prebuilt ML components reside fully in the cloud and are therefore less suitable for the many organizations that prefer to conduct ML capabilities on- premises. ■ Low-level instrumentalization, reuse and project transparency: Even GCP developers will find little support for the creation and reuse of long ML pipelines. The tool chain and concepts simply do not yet allow for the notion of end-to-end ML projects. Google also makes heavy use of the open-source arsenal of ML components, yet offers little or no coherent project management support. Google’s new Kubeflow Pipelines may address some of these deficiencies, but it could not be evaluated as its announcement came after the cutoff date for this Magic Quadrant. H2O.ai H2O.ai (http://h2o.ai/) is based in Mountain View, California, U.S. and offers the free open- source H2O Open-Source Machine Learning (H2O, Sparkling Water and H2O4GPU) and a commercial product called H2O Driverless AI. H2O’s core strength is its high-performing ML components, which are tightly integrated within several competing platforms evaluated in this Magic Quadrant.
16 .H2O.ai has lost some ground in terms of Ability to Execute relative to other vendors in this Magic Quadrant, largely due to comparatively low scores from reference customers for several critical capabilities. Although H2O.ai’s Completeness of Vision still is strong, competitors are catching up in several key innovation areas. This has resulted in its new status as a Visionary. Strengths ■ High-performance ML components: H2O.ai’s open-source ML components are effectively an industry standard, with many other platforms integrating them (for example, those of Alteryx, Dataiku, Domino, IBM, KNIME, RapidMiner and TIBCO Software). H2O.ai’s components are highly optimized and parallelized for CPU multicore and multinode configurations. H2O4GPU offers a software layer for significant GPU acceleration. ■ Innovation: With its deep learning layer (Deep Water), its GPU layer and automation capabilities (H2O Driverless AI), H2O.ai outpaces most of its competitors in the employment of cutting-edge technology capabilities. No other vendor in this Magic Quadrant got higher marks from reference customers in the area of product roadmap and future vision. ■ Automation: With its commercial product, H2O Driverless AI, H2O.ai established a premier position in the automated ML domain, being rivaled by only a select few. The product provides automated feature engineering, model selection and hyperparameter tuning. Driverless AI is highly scalable, but also very compute-intensive, and exports the complete pipeline as either MOJO/POJO objects or a Python-scoring pipeline. H2O.ai also offers an open-source version of augmented data science and ML, called AutoML, which seems far less powerful, but can be used from within its open-source software platform. Cautions ■ Steep learning curve for nondevelopers to use open-source offering: The open-source version of H2O is still highly notebook-centric and therefore caters more to data-science- savvy developers and coding-heavy data scientists. Usability is improved when there is a point-and-click interface, as with offerings from KNIME and RapidMiner. H2O.ai’s H2O Driverless AI product is much simpler to use, but is a distinct product from its open-source offering. ■ Little native interoperability between H2O Driverless AI and open-source platform: H2O.ai’s open-source product line is not fully interoperable with H2O Driverless AI, which is an impediment to potential collaboration by differently skilled enterprise users. ■ Lack of some product capabilities: H2O.ai’s surveyed reference customers identified deficiencies in critical capabilities like platform, project and model management, and a significant lack of features for data access and preparation, compared with other platforms in this Magic Quadrant. H2O.ai retained excellent marks only in the critical capability
17 . categories of ML, performance and delivery. IBM IBM (https://www.ibm.com/us-en/) is based in Armonk, New York, U.S. For this Magic Quadrant we evaluated two platforms: SPSS (including SPSS Modeler and SPSS Statistics) and Watson Studio, an offering that incorporates and builds on IBM’s previous Data Science Experience (DSX) product. IBM remains a Visionary, but has lost ground in terms of both Completeness of Vision and Ability to Execute, relative to other vendors. IBM has defined a clear product strategy and roadmap for the two platforms evaluated in this Magic Quadrant, but needs to prove that its new approach can deliver consistent customer success over time. Strengths ■ Strong visibility and mind share: IBM remains a frontrunner in terms of market share, with 9.5% of the data science platform software market. It is a very visible vendor in the data science and ML market. Its strategy, focused on the complete analytic pipeline, enables both expert and citizen data scientists to be productive. ■ Comprehensive roadmap and product integration: Watson Studio and its roadmap promise to deliver extensive openness, hybrid cloud support and strong analytic capabilities for both expert and citizen data scientists across the full analytic pipeline. Watson Studio provides a new, more modern approach, while continuing not only to support, but also to extend, the capabilities of SPSS. IBM has also delivered a new interface for its SPSS products that is cleaner, more appealing and, most importantly, integrates SPSS Modeler into Watson Studio. ■ Watson Studio customer experience and operations: Reference customers for Watson Studio gave excellent scores for their overall experience. Scores were also strong for IBM’s plans to make further investments and its inclusion of requested product enhancements. Scores for Watson Studio’s service and support and integration and deployment were excellent as well, but those from SPSS customers were in the bottom quartile. Cautions ■ Further revamp of multipronged approach and evolving strategy: Watson Studio shows promise and is indicative of the direction of modern data science platforms. But in light of recent strategy shifts and rebranding, IBM needs to show consistency and long-term commitment to its strategy. ■ Multiple products required for complete capabilities: Multiple IBM products are required to obtain complete end-to-end capabilities. Multiple components potentially increase
18 . complexity and cause confusion. They could also increase licensing costs. ■ Capability shortcomings across both platforms: SPSS received low scores for flexibility, extensibility and openness, automation and augmentation, and collaboration. Although, overall, Watson Studio provides stronger capabilities than SPSS, capabilities for data preparation, data exploration and visualization, delivery and precanned solutions are lacking. KNIME KNIME (https://www.knime.com/) is based in Zurich, Switzerland. It provides the KNIME Analytics Platform on a fully open-source basis for free, while a commercial extension, KNIME Server, offers more advanced functions, such as team, automation and deployment capabilities. KNIME remains a Leader in this Magic Quadrant. This is largely due to strong assessments by its customers, its competitive product offerings and its vision, which is one of the most balanced in this market. Strengths ■ Well-balanced execution and vision: With a wealth of well-rounded functionality, KNIME maintains its reputation for being the market’s “Swiss Army knife.” Its for-free and open- source KNIME Analytics Platform covers 85% of critical capabilities, and KNIME’s vision and roadmap are as good as, or better than, those of most of its competitors. ■ Sophistication with clear product bundling and low TCO: Even with advanced features like ML automation, hybrid cloud, new model management (KNIME Model Process Factory) and deployment, KNIME’s product segmentation has been clear and simple for many years. This simplicity does not prevent data scientists from exploiting advanced analytics features (such as deep learning) — it simply makes that power accessible to citizen data scientists, too. ■ Ease of use by those with intermediate skills: KNIME’s platform addresses the intermediate user skills spectrum very well, and KNIME recently began to place increased emphasis on two of the other skills segments. Less skilled users will appreciate the automated ML offerings included in KNIME Server. Developers will appreciate KNIME’s forthcoming Python integration, which will offer them two-way integration via the ability to call the KNIME API from within Python/Jupyter in order to use many of the thousands of KNIME modules and solutions. ■ Low barrier to entry and TCO: As in prior years, reference customers identified their main reasons for selecting the KNIME platform as its low TCO, predictable costs and value for money.
19 .Cautions ■ Performance and scalability: Although KNIME has taken significant steps to improve its performance and scalability, these capabilities remain by far the chief customer concern in terms of overall product capabilities. KNIME received one of the lowest overall scores from customer references in this category. ■ Limited visibility as modern data science platform: KNIME does not have a reputation for offering a cutting-edge data science platform. This is due much more to its conservative marketing strategy and passive go-to-market approach than its real capabilities. Performance constraints and a reputation as a desktop tool also contribute to the platform sometimes being overlooked for large-scale data science deployments. ■ Limited traction in IoT domain: Real-time analytics and capabilities linked to IoT data often require a significant amount of development investment. Despite having a decent breadth of use cases, KNIME’s focus appears not to have been on asset-centric industries (which account for most IoT work). As the market evolves to embrace multiple data sources, this lack of focus could impair the company’s vision. ■ Relatively small team: Gartner has reservations about KNIME’s strategy of having a relatively small team of fewer than 60 full-time equivalents. Given the growing complexity of the ML stack, a small team is unlikely to be able to cope with the massive innovation that is transforming this market. MathWorks MathWorks (http://www.mathworks.com/) is headquartered in Natick, Massachusetts, U.S. Its two major products are MATLAB and Simulink, but only MATLAB met the inclusion criteria for this Magic Quadrant. MathWorks’ move from the Challengers quadrant to the Visionaries quadrant is essentially due to the company’s remarkable strength in relation to the increasingly demanding needs of asset- centric industries. To serve this growing market, MathWorks has strengthened the coherence of the MATLAB platform for its engineering-focused audience by seamlessly integrating advanced functionality for the treatment of unconventional data sources (images, video and IoT data). Although MathWorks focuses on asset-centric industries, it also has customers in the financial services sector. Strengths ■ Platform coherence and ease of use: Built from an engineering perspective, MATLAB offers a seamless experience, with operationalization as a fully integrated step. Mainly focused on industrial applications, MathWorks takes account of field personnel and subject matter
20 . experts’ experiences through a “citizen engineer” approach aimed at democratizing deployment of its platform. Support for this community is also secured through a rich community ecosystem. ■ Data preparation: In its analytics workflow, MATLAB does not explicitly separate ML and deep learning techniques, as algorithms are considered fit for solving specific problems. MathWorks’ platform therefore offers sophisticated data preparation and labeling capabilities for data that will be used to create deep learning models. ■ Advanced functionality: MathWorks has integrated into MATLAB a range of advanced techniques that can be used in complex use cases. Examples are interpreted notebook experiences (through MATLAB Live Editor), pretrained deep-learning models, embedded real- time analytics, streaming capabilities and simulation techniques (through digital-twin models). Cautions ■ Lack of focus on service-centric organizations: Although nothing prevents nonengineers from using MATLAB, MathWorks’ vision for data science does not focus on marketing, sales or customer service. Data science teams whose primary focus is marketing, sales or customer service should seek alternative platforms. ■ Full cloud platform support: MATLAB offers strong support for AWS and Microsoft Azure, but still lacks comprehensive support for GCP. Although environments such as the pervasive TensorFlow are supported within MATLAB, data science teams that rely on the tight integration between TensorFlow and GCP might not benefit from the performance usually attested in Google-only frameworks. ■ AutoML capabilities: MathWorks has considerably improved its access to open-source libraries and environments in the latest version of MATLAB. However, compared with some of its more visionary competitors, MathWorks still lacks a clear integration vision in relation to AutoML capabilities. Microsoft Microsoft (http://www.microsoft.com/) is based in Redmond, Washington, U.S. It provides a number of software products for data science and ML. In the cloud, it offers Azure Machine Learning (including Azure Machine Learning Studio), Azure Data Factory, Azure HDInsight, Azure Databricks and Power BI. For on-premises workloads, Microsoft offers Machine Learning Server. Only Azure Machine Learning met the inclusion criteria for this Magic Quadrant, although Microsoft’s broader offerings did influence our assessments of Azure Machine Learning’s extended capabilities and Microsoft’s Completeness of Vision.
21 .Microsoft remains a Visionary, having maintained a strong commitment to breadth and ease of open-source technology integration and excellence in relation to deep learning. Azure Machine Learning is not an option for the many data science teams and use cases that require a strictly on-premises product. Strengths ■ Cloud infrastructure approach: Although a significant number of on-premises devotees remain, more organizations are migrating to cloud and hybrid approaches to data science. Microsoft’s first-class cloud approach with Azure Machine Learning provides a fully managed, high-performing environment. The cloud platform also offers advantages in terms of performance tuning, scalability and agile support for open-source technology. ■ Extensive component and partner offerings: The Azure ecosystem offers a wide range of components for data science use cases such as streaming analytics (Azure Stream Analytics), the IoT (Azure IoT Hub) and deep learning (Microsoft Cognitive Toolkit and various open-source frameworks). Azure Databricks offers Microsoft customers first-class support for Apache Spark and integration with Azure tools, and several reference customers praised the platform’s automatic scaling and performance optimization. ■ Support for diverse data science personas: The Azure Machine Learning service offers a code-focused approach for expert data scientists, and Azure Machine Learning Studio offers a highly rated GUI for citizen data scientists. Azure Cognitive Services offers strong functionality and pretrained models for developers. Expert data scientists and data engineers will be drawn to the Azure Databricks offering. Further integration with Power BI brings entry-level ML to masses of business analysts. Cautions ■ Cloud-only applicability: Microsoft’s cloud-only approach with Azure Machine Learning continues to limit certain capabilities and reduces the product’s appeal to some data science teams. The Azure Machine Learning service, a hybrid product for creating and deploying models on-premises, in the cloud and on the edge, was released in December 2018 — after the cutoff date for inclusion in this evaluation. ■ Automation and augmentation: Microsoft needs to keep pace with innovators in automating and assisting with data science tasks. New and improved features will be needed to maintain a competitive position in the fast-moving citizen data science market and to appeal to expert data scientists who use automated ML to accelerate their work. New automated ML capabilities in the Azure Machine Learning service were released in December 2018 — after the evaluation period for this Magic Quadrant. ■ Coherence: Although the Azure ecosystem offers diverse tools and approaches for data
22 . science, many users find the number of components overwhelming and are frustrated by the overall user experience. RapidMiner RapidMiner (https://rapidminer.com/) is based in Boston, Massachusetts, U.S. Its platform includes RapidMiner Studio, RapidMiner Server, RapidMiner Cloud, RapidMiner Real-Time Scoring and RapidMiner Radoop. RapidMiner remains a Leader by striking a good balance between ease of use and data science sophistication. Its platform’s approachability is praised by citizen data scientists, while the richness of its core data science functionality, including its openness to open-source code and functionality, make it appealing to experienced data scientists, too. Strengths ■ Sophisticated simplicity: Features such as Auto Model, augmented analytics capabilities such as Turbo Prep, and an above-average UI make RapidMiner Studio a favorite of citizen data scientists. More advanced users appreciate the richness of RapidMiner’s functionality, including the ability to access and reuse open-source capabilities, which increases their productivity and enables them to build and manage large numbers of models. ■ Advanced features: Ease of use does not preclude the presence of power. Beyond deep learning and GPU support, RapidMiner’s platform now includes data augmentation functionality and enhanced time series features. The company has also been focusing on explainability, from both a model and an analytics process perspective. In addition to helping explain models’ behaviors, providing more transparency at the process level from development to deployment (by clearly setting out the steps of the analytical pipeline and providing the analytical logic linking those steps), enables greater cross-role collaboration. ■ Coherent end-to-end platform: Reference customers made many complimentary comments about the coherence of RapidMiner’s user experience — from its scalable repository management to its real-time scoring. Elements contributing to the continuum include RapidMiner Studio (for model development); RapidMiner Server (for sharing, collaborating on, deploying and maintaining models); RapidMiner Cloud (including repository and execution services destined to host automodeling capabilities); and RapidMiner Real-Time Scoring (introduced in 2018 to provide a low-latency model execution engine). Cautions ■ Data preparation and visualization: Turbo Prep’s introduction mostly facilitates the data preparation process for citizen data scientists, so advanced users still find that RapidMiner’s
23 . data preparation capabilities do not match the sophistication of other analytics components of the platform. Also, despite decent progress by RapidMiner with regard to data visualization offerings, users still find they need to rely on complementary capabilities for the more powerful visualization options. ■ License and pricing models: The simplification of RapidMiner’s pricing process over the past year has still not answered some concerns of RapidMiner’s customers who have been facing complicated pricing schemes and difficult-to-navigate pricing conditions. This has slowed the rapid growth in some organizations’ adoption of the platform. ■ Model operationalization: RapidMiner’s enhanced model management and repository features have made its lack of full model operationalization capabilities even more prominent. Given the fluidity of its environment, we expect RapidMiner to devote more resources to the operationalization cycle of the data science process by including functionalities such as production ensemble model monitoring and business key performance indicator (KPI) validation and monitoring. SAP SAP (http://www.sap.com/) is based in Walldorf, Germany. It offers SAP Predictive Analytics (PA). This platform has a number of components, including Data Manager for dataset preparation and feature engineering, Automated Modeler for citizen data scientists, Expert Analytics for more advanced ML, and Predictive Factory for operationalization. SAP PA is tightly integrated with SAP HANA. SAP’s data science offering is closely tied to the company’s expanding Intelligent Enterprise vision and SAP Leonardo. We considered these when assessing SAP’s Completeness of Vision, but they did not contribute to SAP’s Ability to Execute position in this Magic Quadrant. SAP remains a Niche Player due to low customer satisfaction scores, a lack of thought leadership in key innovation areas, and declining mind share in a highly competitive market. Strengths ■ Suitability for SAP-centric organizations and data science operations: Many SAP customers identify alignment with existing data and analytics investments as a key reason for choosing its platform. SAP PA is well-suited to handling very large datasets via SAP HANA and deploying models to SAP’s wide range of applications. SAP PA received excellent scores from reference customers for delivery and platform/project management and a strong score for model management. The SAP client base and ecosystem form SAP’s sizable niche. ■ Intelligent Enterprise vision: SAP’s vision of a unified ML fabric across all its applications
24 . accords with the impending reality that the vast majority of business users will consume ML via applications with embedded intelligence. The SAP PA roadmap is closely tied to the company’s overall strategy for SAP Leonardo Machine Learning Foundation and establishing a new end-to-end life cycle for data science. This strategy will resonate with customers already heavily invested in SAP systems and applications, and with prospective customers seeking a way to bring AI to the masses. ■ Support for diverse data science professionals: SAP PA offers environments tailored for expert data scientists (Expert Analytics) and citizen data scientists (Automated Modeler). Business analysts among SAP’s reference customers identified automation for quick and easy prototyping as a major strength of the platform. SAP PA’s interfaces for C++ and Java will complement developers who might leverage pretrained models and AI services in the Leonardo ecosystem. Cautions ■ Product suite transition: In recent years, SAP has received consistently low scores from reference customers for most critical capabilities, and its data science offering will undergo significant changes in the near future as SAP executes its broader AI strategy. SAP needs to catch up to its competitors in key innovation areas, such as agile support for increasingly demanded open-source technologies, deep learning, streaming and the IoT. ■ Customer experience and mind share: SAP still needs to improve aspects of its customer experience. Reference customers gave low scores for SAP’s inclusion of requested product enhancements, documentation, contract negotiation and account management. SAP continues to struggle to gain mind share for PA as dedicated data science vendors continue to disrupt this market and other large vendors develop new products. SAP appears on a low percentage of the shortlists, seen by Gartner, of those choosing a data science and ML platform, relative to other vendors in this Magic Quadrant. ■ Coherence: Although SAP PA benefits from HANA’s functionality and other components, its reference customers gave it a low overall score for coherence. A fragmented toolchain is frustrating for users and results in convoluted workflows. Improving the coherence of the data science platform and creating a unified user experience will be crucial as SAP PA evolves within the larger SAP AI ecosystem. SAS SAS (http://www.sas.com/) is based in Cary, North Carolina, U.S. It provides many software products for analytics and data science. For this Magic Quadrant, we evaluated SAS Enterprise Miner (EM) and SAS Visual Data Mining and Machine Learning (VDMML).
25 .SAS retains its long-held status as a Leader. Although the company faces threats on multiple fronts from other large vendors, maturing disruptors and open-source solutions, it retains a strong presence in the market. SAS’s Completeness of Vision is in the same class as many highly innovative competitors, but the company is falling behind in key areas such as deep learning and contributions to the open-source community. Its Ability to Execute is hampered by high and sometimes unpredictable costs, which cause existing and prospective customers to explore other options. Like other veterans of the data science market, in addition to focusing on new clients, SAS is embracing the challenge of supporting legacy customers and users while adapting to a rapidly changing landscape. Strengths ■ Incumbent market presence and trusted brand: SAS’s long market presence and considerable staying power have earned it much respect from customers. Many reference customers praised its products’ quality, stability and reliability. That solidity might have come at the expense of a few advances (such as quick adoption of open-source capabilities), but it has not prevented SAS from innovating and staying on a par with many of its newer competitors. ■ Robustness of SAS EM: SAS EM’s reliability throughout the analytics and data science life cycle is recognized throughout the market. From data ingestion and preparation to model production and deployment, the platform continues to deliver dependable results. SAS is well-placed to replicate that remarkable on-premises strength in a multicloud environment. ■ Interface and data engagement capabilities of SAS VDMML: SAS VDMML received excellent scores for user interface and data exploration and visualization. It also received strong scores for data preparation and automation and augmentation. SAS VDMML appeals to citizen data scientists as well as code-focused data scientists and developers. ■ Operational excellence: SAS’s comprehensive worldwide support infrastructure is unmatched. Customers choose SAS for its robust, enterprise-grade platform capabilities, which range from exploration to modeling and deployment. SAS also offers significant analytic and industry expertise, which customers rely on. Reference customers gave high scores to SAS’s documentation, customer and analytic support, and overall service and support. Cautions ■ Pricing and sales execution: SAS’s reference customers gave scores for product evaluation and contract negotiation experience that were in the bottom quartile. In addition, SAS’s pricing remains a concern for existing and prospective customers — Gartner clients frequently investigate less costly alternatives. Free open-source data science platforms are
26 . increasingly used, along with SAS products, as a way of controlling costs, especially for new projects. ■ Coherence: SAS’s full complement of products is complex and often confusing. Offering two platforms that are not fully interoperable and that have numerous additional components available increases confusion and complexity in terms of managing, deploying and using SAS’s products. The coexistence of SAS Viya and SAS 9 perpetuates the perception of a lack of cohesion. Although SAS has made progress in this regard, migration is still perceived as an issue for those that want to exploit Viya’s capabilities but are not currently on that architecture. ■ Unfashionable interface of SAS EM: SAS EM has not kept up with the times and, though still effective, has a dated UI. Its UI contrasts with those of modern competitors, which are more intuitive and cleaner. SAS VDMML offers a much more modern UI, representing SAS’s future direction. ■ Flexibility and additional support for open source: In its latest release, SAS has added more tools and support for open source, but customers would like to see both SAS EM and VDMML continue to extend first-class support for open-source tools, libraries and frameworks. SAS also needs to continue to improve support for Docker and containerization. TIBCO Software TIBCO Software (https://www.tibco.com/) is based in Palo Alto, California, U.S. Through the acquisition of enterprise reporting and modern BI platform vendors (Jaspersoft and Spotfire), descriptive and predictive analytics platform vendors (Statistica and Alpine Data), and a streaming analytics vendor (StreamBase Systems), TIBCO has built a well-rounded and powerful analytics platform. TIBCO has moved from the Challengers quadrant to the Leaders quadrant, thanks to a well- orchestrated integration strategy that contributes to its Ability to Execute, and its efforts to keep pace with the rate of innovation in this rapidly changing market. TIBCO has distinctive skill at serving asset-centric industries. In addition to having end-to-end development and deployment capabilities, TIBCO successfully addresses the underserved data science IoT analytics domain, partly as a result of its process-centric roots. Strengths ■ Successful consolidation: On a single platform, TIBCO brings together powerful visualization capabilities, strong descriptive analytics and visionary predictive analytics features (from Statistica and Alpine Data, now rebranded as Spotfire Data Science). At the same time, TIBCO has maintained its platform’s necessary extensibility to open-source environments.
27 . Open-source code, for example, can be developed within the platform or in an outside environment and then seamlessly integrated into the data science pipeline’s workflow. ■ “Connected Intelligence” and IoT: In addition to a strong set of connectors and APIs for machine data, real-time data capture and model scoring, TIBCO has invested in IoT edge analytics to give developers tools for distributing and monitoring models on edge devices and gateways. In addition, the combination of TIBCO Streaming and Statistica is a robust differentiator for TIBCO’s Connected Intelligence strategy. ■ End-to-end data science process: The overall ease of use of TIBCO’s platform (often praised by reference customers) should not overshadow the sophistication and completeness of its functionality. Visual workflows spanning the full data science process, from data ingestion to model management, provide a strong basis for effective collaboration by all roles (data scientists, business analysts, citizen data scientists, process engineers and so on). Cautions ■ Performance and stability: Some reference customers identified cases where the performance of TIBCO’s platform was suboptimal, and remarked that this temporarily slowed their development process. Upcoming improved integration with external cloud services, along with development of hybrid analytical workflows, could alleviate this problem. ■ Data management: TIBCO offers strong data access and visualization capabilities, but automated and more integrated data preparation and management capabilities should be an integral part of the platform, given its wide reach. We expect TIBCO to invest in this important functionality for its upcoming release. ■ Incomplete operationalization focus: TIBCO’s model management and deployment capabilities have improved greatly over the past year, but many gaps remain. Given TIBCO’s strength across a wide range of industries, model operationalization capabilities beyond deployment — that is, for the full governance and data science process for models in production — will be crucial. Vendors Added and Dropped We review and adjust our inclusion and exclusion criteria for Magic Quadrants as markets change. As a result of these adjustments, the mix of vendors in any Magic Quadrant may change over time. A vendor’s appearance in a Magic Quadrant one year and not the next does not necessarily indicate that we have changed our opinion of that vendor. It may be a reflection of a change in the market and, therefore, changed evaluation criteria, or of a change of focus by that vendor.
28 .Added ■ Google ■ DataRobot ■ Datawatch (Angoss) Dropped ■ Teradata, which is revamping its data science and ML offering ■ Angoss, which was acquired by Datawatch in January 2018 Inclusion and Exclusion Criteria We made some changes to the inclusion criteria for this edition of the Magic Quadrant. The inclusion process included requirements for vendors to meet a revenue threshold and identify reference customers. A stack ranking process assessed how well products support the most typical use case scenarios for data science and ML, namely: ■ Business exploration: This is the classic scenario of “exploring the unknown” that requires extensive data preparation, exploration and visualization capabilities combining new and existing data sources and types. This scenario could also include the incorporation of “smart” capabilities to guide the data preparation, use of visualization and analysis that incorporate ML techniques “under the covers.” ■ Advanced prototyping: This scenario describes projects where data science and, especially, novel ML solutions are used to significantly improve traditional analytic approaches. Traditional approaches can be the use of human judgment, exact solutions, decade-old heuristic approaches or traditional data mining approaches. All scenarios are considered that utilize some or all of the following: ■ Many more data sources ■ Novel analytic approaches (such as deep neural nets, ensembles and natural language processing) ■ Large-scale computing infrastructure ■ Specialized computer science and ML skills ■ Production refinement: This is the scenario on which many data science teams spend the
29 . majority of their time. In this scenario, the organization, having implemented several data science solutions and delivered them to the business, has shifted its focus to improving and updating the existing models. ■ Nontraditional data science: This new use case represents a movement toward incorporating capabilities into the data science and ML platform that specifically support citizen data scientists and/or developers. Gartner defines a citizen data scientist as a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics. We used the following 15 critical capabilities to score the vendors’ capabilities across the four use-case scenarios: ■ Data access: How well does the product support data access across many types of data (such as tables, images, graphs, logs, time series, audio and texts)? ■ Data preparation: Does the product have a significant array of noncoding or coding data preparation features? ■ Data exploration and visualization: Does the product allow for a range of exploratory steps, including interactive visualization? ■ Automation and augmentation: Does the product facilitate the automation of feature generation and hyperparameter tuning? ■ User interface (UI): Does the product have a coherent “look and feel” and have an intuitive interface, ideally one supporting a visual pipelining component or visual composition framework (VCF)? ■ Machine learning (ML): How broad are the ML approaches that are easily accessible and shipped (prepackaged) with the product, along with support for modern ML approaches like ensemble techniques (boosting, bagging and random forests) and modern dimension reduction schemes? ■ Other advanced analytics: How are other methods from the fields of statistics, optimization, simulation, and text and image analytics, integrated into the development environment? ■ Flexibility, extensibility and openness: How can various open-source libraries be integrated into the platform? How can users create their own functions? How does the platform work with notebooks? ■ Performance and scalability: How can desktop, server and cloud deployments be controlled?