ITPub博客

hadoop集群多节点安装详解

原创 NoSQL 作者:flzhang 时间:2015-09-01 16:01:41 0 删除 编辑

经常使用工具自动构建大规模集群环境,小环境也有10几台的机器,虽然自动部署很省事,但自动构建的背后那些机器自动完成的工作让我们疏忽了,特别是要自己构建一个小集群用于了解搭建细节和原理还是很有帮助的,今天为复习和了解下hadoop各进程间协调运行的原理,搭建了一个3节点的机器,并记录自己的搭建过程。
一 搭建集群
基本环境配置
IP                        Host                             部署进程
192.168.0.110         elephant                         namenode
                                                                   datanode
                                                                   nodemanager
192.168.0.110         tiger                               nodemanager
                                                                   datanode
  
192.168.0.110         horse                             resourcemanager
                                                                  datanode
                                                                  nodemanager
                                                                  jobhistoryserver                            

1.1 安装CDH5 yum 源
下载cdh5包
Wget http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/cloudera-cdh5.repo
mv cloudera-cdh5.repo /etc/yum.repo.d
1.2 在各节点安装对应组件
1. 安装namenode和datanode
在elephant上安装namenode
sudo yum install --assumeyes hadoop-hdfs-namenode
在elephant,tiger和horse上安装datanode

sudo yum install --assumeyes hadoop-hdfs-datanode
2. 安装resourceManger和nodeManager
在horse上安装resourceManager
sudo yum install –assumeyes Hadoop-yarn-resourcemanager

在elephant,tiger和horse上安装nodemanager
sudo yum install –assumeyes Hadoop-yarn-nodemanager
3. 安装mapreduce框架
在elephant,tiger和horse上安装mapreduce
sudo yum install –assumeyes Hadoop-mapreduce
4.  安装jobhistoryserver
在hosrse 安装jobhistoryserver
sudo yum install –assumeyes Hadoop-mapreduce-historyserver

1.3 修改配置文件
在elephant上修改配置文件
1 Copy模板文件
sudo cp core-site.xml /etc/hadoop/conf/
sudo cp hdfs-site.xml /etc/hadoop/conf/
sudo cp yarn-site.xml /etc/hadoop/conf/
sudo cp mapred-site.xml /etc/hadoop/conf/
2 sudo vi core-site.xml
name value
fs.defaultFS hdfs://elephant:8020

3 sudo vi hdfs-site.xml
dfs.namenode.name.dir file:///disk1/dfs/nn,file:///disk2/dfs/nn
dfs.datanode.data.dir file:///disk1/dfs/dn,file:///disk2/dfs/dn


4 sudo vi yarn-site.xml
yarn.resourcemanager.hostname horse
yarn.application.classpath 保留模板中默认值
yarn.nodemanager.aux-services mapreduce_shuffle
--yarn中使用mapreduce计算框架
yarn.nodemanager.local-dirs file:///disk1/nodemgr/local,file:///disk2/nodemgr/local

yarn.nodemanager.log-dirs /var/log/hadoop-yarn/containers
yarn.nodemanager.remote-app-log-dir /var/log/hadoop-yarn/apps
yarn.log-aggregation-enable TRUE

5 sudo vi mapred-sitexml
mapreduce.framework.name yarn
mapreduce.jobhistory.address horse:10020
mapreduce.jobhistory.webapp.address horse:19888
yarn.app.mapreduce.am.staging-dir /user

6 减小jvm堆大小
export HADOOP_NAMENODE_OPTS="-Xmx64m"
export HADOOP_SECONDARYNAMENODE_OPTS="-Xmx64m"
export HADOOP_DATANODE_OPTS="-Xmx64m"
export YARN_RESOURCEMANAGER_OPTS="-Xmx64m"
export YARN_NODEMANAGER_OPTS="-Xmx64m"
export HADOOP_JOB_HISTORYSERVER_OPTS="-Xmx64m"
7 Copy 所有配置文件到tiger,horse主机

1.4 创建指定目录
1 在elephant 创建和存放nodemanger,namenode,datanode相关目录
$ sudo mkdir -p /disk1/dfs/nn
$ sudo mkdir -p /disk2/dfs/nn
$ sudo mkdir -p /disk1/dfs/dn
$ sudo mkdir -p /disk2/dfs/dn
$ sudo mkdir -p /disk1/nodemgr/local
$ sudo mkdir -p /disk2/nodemgr/local
2 设置目录权限
$ sudo chown -R hdfs:hadoop /disk1/dfs/nn
$ sudo chown -R hdfs:hadoop /disk2/dfs/nn
$ sudo chown -R hdfs:hadoop /disk1/dfs/dn
$ sudo chown -R hdfs:hadoop /disk2/dfs/dn
$ sudo chown -R yarn:yarn /disk1/nodemgr/local
$ sudo chown -R yarn:yarn /disk2/nodemgr/local
3 验证目录和权限
$ ls -lR /disk1
$ ls -lR /disk2

1.5  格式化hdfs并启动hdfs相关进程
1 启动namenode 和查错
1) 在elephant
sudo –u hdfs hdfs namenode –format
如果提示是否重新格式化,输入Y
启动namenode
sudo service hadoop-hdfs-namenode start
2)查看namenode日志
手工查看
可以根据启动时提示的.out 文件路径查看对应.log的文件
less /var/log/hadoop-hdfs/ hadoop-hdfs-namenode-elephant.log
web UI查看
查看namenode 的web UI http://elephant:50070.
选择 Utilities->Logs.
2 启动datanode和查错
1)在elephant,tiger,horse启动
sudo service hadoop-hdfs-datanode start
2) 查看datanode日志
手工查看
less /var/log/hadoop-hdfs/ hadoop-hdfs-datanode-tiger.log
web UI查看
查看datanode的web UI http://tiger:50075 ,选择datanode日志
在其他节点horse上查看日志也可用如上方法


1.6 在hdfs上创建为yarn和mapreduce创建目录
$ sudo -u hdfs hadoop fs -mkdir /tmp
$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
$ sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn
$ sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
$ sudo -u hdfs hadoop fs -mkdir /user
$ sudo -u hdfs hadoop fs -mkdir /user/training
$ sudo -u hdfs hadoop fs -chown training /user/training
$ sudo -u hdfs hadoop fs -mkdir /user/history
$ sudo -u hdfs hadoop fs -chmod 1777 /user/history
$ sudo -u hdfs hadoop fs -chown mapred:hadoop /user/history
1.7  启动yarn和mapreduce进程
1 horse上启动resourcemanager
sudo service hadoop-yarn-resourcemanager start
2所有节点上启动nodemanager
sudo service hadoop-yarn-nodemanager start
3horse上启动historyserver
sudo service hadoop-mapreduce-historyserver start

1.8 测试集群
1 上传测试文件到hdfs
$ hadoop fs -mkdir -p elephant/shakespeare
$ hadoop fs -put shakespeare.txt elephant/shakespeare
2 通过namenode webui 查看文件是否上传
查看 Utilities->“Browse the file system”选择目录查看文件
3 测试mapreduce
在elephant
$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount elephant/shakespeare elephant/output
使用webui 访问resourcemanager 判断applicationmaster,mapper,reducer这些task运行在哪些主机

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/750077/viewspace-1788595/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2014-03-26

  • 博文量
    98
  • 访问量
    705161