ITPub博客

首页 > 大数据 > 数据分析 > Cloudera CDH简介

Cloudera CDH简介

原创 数据分析 作者:ilsyx 时间:2017-08-30 01:40:41 0 删除 编辑

 

马马虎虎学了30多天的大数据课程,从最开始的不明所以到现在略知一二,准备将所学的内容进行一下梳理。

 

CDH同级的概念是 HDP,Apache Hadoop.

 

本文讲讲CDH相关的概念.

CDHCloudera这个公司发布的产品,官网地址https://www.cloudera.com/

到官方文档地址https://www.cloudera.com/documentation.html 可知CDHCloudera Enterprise产品中的一员。

 

 

查看Cloudera Enterprise文档的Introduction(当前5.12为最高版本) https://www.cloudera.com/documentation/enterprise/latest/topics/introduction.html

 

Cloudera provides a scalable, flexible, integrated platform that makes it easy to manage rapidly increasing volumes and varieties of data in your enterprise. Cloudera products and solutions enable you to deploy and manage Apache Hadoop and related projects, manipulate and analyze your data, and keep that data secure and protected.

Cloudera provides the following products and tools:

  • CDH—The Cloudera distribution of Apache Hadoop and other related open-source projects, including Apache Impala (incubating) and Cloudera Search. CDH also provides security and integration with numerous hardware and software solutions.
  • Apache Impala (incubating)—A massively parallel processing SQL engine for interactive analytics and business intelligence. Its highly optimized architecture makes it ideally suited for traditional BI-style queries with joins, aggregations, and subqueries. It can query Hadoop data files from a variety of sources, including those produced by MapReduce jobs or loaded into Hive tables. The YARN resource management component lets Impala coexist on clusters running batch workloads concurrently with Impala SQL queries. You can manage Impala alongside other Hadoop components through the Cloudera Manager user interface, and secure its data through the Sentry authorization framework.
  • Cloudera Search—Provides near real-time access to data stored in or ingested into Hadoop and HBase. Search provides near real-time indexing, batch indexing, full-text exploration and navigated drill-down, as well as a simple, full-text interface that requires no SQL or programming skills. Fully integrated in the data-processing platform, Search uses the flexible, scalable, and robust storage system included with CDH. This eliminates the need to move large data sets across infrastructures to perform business tasks.
  • Cloudera Manager—A sophisticated application used to deploy, manage, monitor, and diagnose issues with your CDH deployments. Cloudera Manager provides the Admin Console, a web-based user interface that makes administration of your enterprise data simple and straightforward. It also includes the Cloudera Manager API, which you can use to obtain cluster health information and metrics, as well as configure Cloudera Manager.
  • Cloudera Navigator—End-to-end data management and security for the CDH platform. Cloudera Navigator Data Management enables administrators, data managers, and analysts explore vast data collections in Hadoop. Cloudera Navigator Encrypt and simplifies the storage and management of encryption keys. The robust auditing, data management, lineage management, lifecycle management, and encryption key management in Cloudera Navigator allow enterprises to adhere to stringent compliance and regulatory requirements.

 

看完说明后,大体了解到Cloudera提供如下产品和工具:CDH,Apache Impala,Cloudera Search,Cloudera Manager,Cloudera Navigator .  其中CDH包含Apache ImpalaCloudera Search. 总结起来,Cloudera提供CDH,Cloudera Manager,Cloudera Navigator三大件.

文档后面章节对这三大件各做了简介

 

CDH Overview

CDH delivers the core elements of Hadoop

Introduction文档中有提到,关于CDH各组件的信息,超出了Cloudera文档的范围。各组件的使用我以后会在使用中编写相应的文档。

https://www.cloudera.com/documentation/enterprise/latest/images/xcdh.png.pagespeed.ic.PezntPgX3c.png

 

Cloudera Manager 5 Overview

With Cloudera Manager, you can easily deploy and centrally operate the complete CDH stack and other managed services.

说白了CM可以使CDH的安装和管理简化.

Terminology

https://www.cloudera.com/documentation/enterprise/latest/images/xcm_model.jpg.pagespeed.ic.jHN6w5pstZ.jpg

 

 

Architecture

https://www.cloudera.com/documentation/enterprise/latest/images/xcm_arch.png.pagespeed.ic.X2FeSVECvw.png

 

 

Cloudera Navigator Data Management Overview

Cloudera Navigator Data Management is a complete solution for data governance, auditing, and related data management tasks that is fully integrated with the Hadoop platform.

这个解释有些抽象,后面FAQ中有一个问题回复比较简明

Is Cloudera Navigator a module of Cloudera Manager?

Not exactly. Cloudera Navigator is installed separately, after Cloudera Manager is installed, and it interacts behind the scenes with Cloudera Manager to deliver some of its core functionality. Cloudera Manager is used by cluster administrators to manage the cluster and all its services. Cloudera Navigator is used by administrators but also by security and governance teams, data stewards, and others to audit, trace data lineage from source raw data through final form, and perform other comprehensive data governance and stewardship tasks.

 

 

如果不涉及到数据安全审计等方面,Cloudera Navigator可以不用安装。

 

了解了CDH相关的概念后,开始准备安装。安装会单独写个文档,网上可参考的安装文档也很多。我准备参考官方文档,依照官方文档中的步骤内容进行。

https://www.cloudera.com/documentation/enterprise/latest/topics/introduction.html

 

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/11780477/viewspace-2144302/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2009-06-12

  • 博文量
    195
  • 访问量
    598520