- 快召唤伙伴们来围观吧
- 微博 QQ QQ空间 贴吧
- 视频嵌入链接 文档嵌入链接
- 复制
- 微信扫一扫分享
- 已成功复制到剪贴板
Apache Linkis数据处理实战-李孟
Linkis 在上层应用和底层引擎之间构建了一层计算中间件。 关于Linkis 数据处理实战,我主要分享两方面,一方面关于元数据,另一方面关于计算任务。 元数据划分为三类:数据字典、数据血缘和数据特征,Linkis基于Linkis DataSource和Apache Atlas 两种服务为数据资产提供元数据管理能力。 dolphinscheduler 拉起Linkis 计算任务,dolphinscheduler Shell 任务类型 通过LinkisDolphinSchedulerClient 配置相关参数,拉起相关任务。
李孟,仙翁数据架构师,Apache Linkis Committer,CSDN博客专家,WeDataSphere社区贡献者,Exchangis贡献者,Streamis贡献者
展开查看详情
1 .
2 .Apache Linkis Data Processing Practice 李孟 2022.10.22
3 .Background The basic composition of the original version of the big data platform framework: Beam data processing and Hive Hook as metadata
4 .Background shortcoming • external access single • high development cost • metadata access is complex
5 .
6 .Selection Apache Livy Apache Zeppelin Netflix Geine openLooKeng Apache Linkis
7 .Selection Linkis builds a decoupling computing middleware layer with the ability to connect, expand, manage, orchestrate and reuse • Computing Governance Services • Public Enhancement Services • Microservice Governance Service • Complete component base • Community
8 .Scenario and Value DataSphereStudio(DSS) Linkis Scriptis DolphinScheduler Exchangis Streamis ......
9 .Scenario and Value Provide the basis for data processing • data assets • data model • data development • data scheduling
10 .Scenario and Value
11 .Architecture-Linkis
12 .Architecture-Linkis
13 .Architecture-DSS
14 .Data Governance-DataSource • Basic functions of a data source service provider • The data source service provides some basic information about the data source environment • Supports parameter dynamic lookup library table generation for multiple data source types • Supports data sources to be divided according to user, creation source, and creation system • Support for different data source connection tests • A metadata query service that supports a certain data source
15 .Data Governance-DataSource
16 .Data Governance-Apache Atlas Atlas is a data governance and metadata framework tightly coupled to the Hadoop ecosystem.
17 .Data Governance-Apache Atlas • type system • graphics engine • capture/export • api • messaging • metadata sources • atlas admin ui • tag based policies
18 .Data Governance-Apache Atlas
19 .Data Governance-Apache Atlas
20 .Data Governance-Assets Data asset management to make data traceable, usable, and trusted • asset overview • asset catalog
21 .Data Governance-Assets
22 .Data Governance- DataWarehouse Data warehouse standard specification management. • Subject Domain Management • Data warehouse management • Modifier management • Statistical cycle management
23 .Data Governance- DataWarehouse
24 .Data Governance-DataModel The content described by the data model has three parts, namely data operations, data constraints and data structures. • table management • dimension management • metric management • indicator management • tag management
25 .Data Processing-Exchangis A data exchange tool that supports the synchronization of structured and unstructured data transfers between heterogeneous data sources. • Lightweight data source management • High stability, fast response data synchronization task execution • Open up with DSS workflow, one-stop big data development portal
26 .Data Processing-Exchangis
27 .Data Processing- Dolphinscheduler Apache DolphinScheduler is a distributed and easily extensible visual DAG workflow task scheduling open source system.
28 .Data Processing- Dolphinscheduler • AppConn is the core concept of DSS that can easily and quickly integrate various upper-layer web systems. • Dolphinscheduler can publish integration components to the scheduler with DSS publishing function
29 .Data Processing- Dolphinscheduler DolphinScheduler can normally schedule the workflow node jobs of DataSphere Studio. You also need to install the dss-dolphinscheduler-client plugin, which is used to execute DSS workflow node jobs.