首页 > 大数据 > 数据分析 > Hadoop的生态系统 - KEYWORD

Hadoop的生态系统 - KEYWORD

原创 数据分析 作者:leniz 时间:2016-12-19 08:23:09 0 删除 编辑
大数据不单单指面临的数据巨大,其实工具图也是颇为壮观。 每次阅读都是看到各种各样的新词(可能对老人来说是旧词),所以我想把这些词记录下来。

  • Hadoop Common: The common utilities that support the other Hadoop modules. ( 更像是接口集合 )
  • Hadoop Distributed File System (HDFS): A distributed file system that provides high-throughput access to application data.  ( 底层的文件分布系统 )
  • Hadoop YARN: A framework for job scheduling and cluster resource management. (这个是Hadoop 2版本后才出现的事务管理框架,Yet Another Resource Negotiator)
  • Hadoop MapReduce: A YARN-based system for parallel processing of large data sets. (分布式数据处理模型和执行环境)


  • Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.  (Web式的Hadoop管理平台)
  • Avro: A data serialization system. (一种序列化系统,用于支持高效、跨语言的RPC和持久化数据存储)
  • Cassandra: A scalable multi-master database with no single points of failure.
  • Chukwa: A data collection system for managing large distributed systems.
  • HBase: A scalable, distributed database that supports structured data storage for large tables. (一种分布式的,按列存储的数据库。HBase使用HDFS作为底层存储,同时支持MapReduce的批量式计算和点查询-随机读取)
  • Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.(一种分布式的,按列存储的数据仓库, HIVE管理HDFS中的存储数据,提供SQL访问)
  • Mahout: A Scalable machine learning and data mining library. (机器学习的运用库)
  • Pig: A high-level data-flow language and execution framework for parallel computation. (数据流语言。运行在MapReduce和HDFS之上)
  • Spark: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
  • Tez: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive?, Pig? and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop? MapReduce as the underlying execution engine.
  • ZooKeeper: A high-performance coordination service for distributed applications.
  • Sqoop: 该工具用于在结构化数据存储(如关系型数据库)和HDFS之间高效批量传输。 (ETL工具)
  • Oozie: 该服务用于运行和调度Hadoop作业(如MapReduce,Pig,Hive和Sqoop作业) (比较类似作业监控系统

来自 “ ITPUB博客 ” ,链接:,如需转载,请注明出处,否则将追究法律责任。

上一篇: Hadoop 大事记 - 73
请登录后发表评论 登录


  • 博文量
  • 访问量