ITPub博客

首页 > 大数据 > Hadoop > centos7 hadoop3.2.0分布式集群搭建步骤

centos7 hadoop3.2.0分布式集群搭建步骤

原创 Hadoop 作者:chenfeng 时间:2020-10-14 14:06:12 0 删除 编辑

一、环境介绍

1.四台CentOS7 Linux虚拟机机器分布情况:

192.168.0.1  安装NameNode,ResourceManager和SecondaryNameNode

192.168.0.2  安装NodeManager和DataNode

192.168.0.3  安装NodeManager和DataNode

192.168.0.4  安装NodeManager和DataNode


2.配置DNS(每个节点)

 编辑配置文件,添加主节点和从节点的映射关系。

#vi /etc/hosts

192.168.0.1   mdw2  hadoop01

192.168.0.2   mdw3  hadoop02

192.168.0.3   mdw4  hadoop03

192.168.0.4   mdw5  hadoop04


3. 关闭防火墙(每个节点)

# systemctl stop firewalld

#关闭开机自启动

# systemctl disable firewalld


4. 配置免密码登录

有关【配置免密码登录方法】,请参考

https://www.cnblogs.com/shireenlee4testing/p/10366061.html


5. 配置Java环境(每个节点)

有关【配置java环境方法】,请参考

https://www.cnblogs.com/shireenlee4testing/p/10368961.html



二、搭建Hadoop完全分布式集群

1. 下载Hadoop安装包,解压,配置Hadoop环境变量

# wget https://archive.apache.org/dist/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz


#解压到/opt目录

# tar -zxvf hadoop-3.2.0.tar.gz

#链接/opt/hadoop-3.2.0到/opt/hadoop,方便后续配置

 #ln -s hadoop-3.2.0 hadoop


#配置Hadoop环境变量和java环境变量

# vi /etc/profile

Hadoop

export HADOOP_HOME=/opt/hadoop 

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop


#jdk

export JAVA_HOME=/opt/jdk

export PATH=$PATH:$JAVA_HOME/bin


 2. 配置Hadoop环境脚本文件中的JAVA_HOME参数

#进入Hadoop安装目录下的etc/hadoop目录

# cd /opt/hadoop/etc/hadoop


#分别在hadoop-env.sh、mapred-env.sh和yarn-env.sh 文件中添加或修改如下参数:

# vi hadoop-env.sh 

............................................................

............................................................

# The java implementation to use. By default, this environment

# variable is REQUIRED on ALL platforms except OS X!

 export JAVA_HOME=/opt/jdk

 

# vi mapred-env.sh

............................................................

............................................................

 # Specify the log4j settings for the JobHistoryServer

# Java property: hadoop.root.logger

#export HADOOP_JHS_LOGGER=INFO,RFA


export JAVA_HOME=/opt/jdk


# vi mapred-env.sh

............................................................

............................................................

# Specify the log4j settings for the JobHistoryServer

# Java property: hadoop.root.logger

#export HADOOP_JHS_LOGGER=INFO,RFA


export JAVA_HOME=/opt/jdk


# vi yarn-env.sh

............................................................

............................................................

# YARN Services parameters

###

# Directory containing service examples

# export YARN_SERVICE_EXAMPLES_DIR = $HADOOP_YARN_HOME/share/hadoop/yarn/yarn-service-examples

# export YARN_CONTAINER_RUNTIME_DOCKER_RUN_OVERRIDE_DISABLE=true


export JAVA_HOME=/opt/jdk



#验证Hadoop配置是否生效

# hadoop version

Hadoop 3.2.0

Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf

Compiled by sunilg on 2019-01-08T06:08Z

Compiled with protoc 2.5.0

From source with checksum d3f0795ed0d9dc378e2c785d3668f39

This command was run using /opt/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar


3. 修改Hadoop配置文件

Hadoop安装目录下的etc/hadoop目录中,需修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、workers文件,根据实际情况修改配置信息。

# cat /opt/hadoop/etc/hadoop/core-site.xml 

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->


<!-- Put site-specific property overrides in this file. -->




<configuration>

  <property>

      <!-- 配置hdfs地址 -->

      <name>fs.defaultFS</name>

      <value>hdfs://hadoop01:9000</value>

  </property>

  <property>

      <!-- 保存临时文件目录,需先在/opt/hadoop下创建tmp目录 -->

      <name>hadoop.tmp.dir</name>

     <value>/home/hadoop/tmp</value>

 </property>

 </configuration>


# cat /opt/hadoop/etc/hadoop/hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->


<!-- Put site-specific property overrides in this file. -->


<configuration>

      <property>

         <!-- 主节点地址 -->

          <name>dfs.namenode.http-address</name>

          <value>hadoop01:50070</value>

      </property>

      <property>

          <name>dfs.namenode.name.dir</name>

          <value>file:/opt/hadoop/dfs/name</value>

     </property>

     <property>

         <name>dfs.datanode.data.dir</name>

         <value>file:/opt/hadoop/dfs/data</value>

     </property>

     <property>

        <!-- 备份数为默认值3 -->

        <name>dfs.replication</name>

         <value>3</value>

     </property>



    <property>

      <name>dfs.webhdfs.enabled</name>

      <value>true</value>

     </property>


     <property>

      <name>dfs.permissions</name>

      <value>false</value>

      <description>配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除.</description>

  </property>


</configuration>



# cat /opt/hadoop/etc/hadoop/mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->


<!-- Put site-specific property overrides in this file. -->




<configuration>

      <property>

          <name>mapreduce.framework.name</name>

          <value>yarn</value> #设置MapReduce的运行平台为yarn

      </property>

      <property>

          <name>mapreduce.jobhistory.address</name>

          <value>hadoop01:10020</value>

      </property>

     <property>

         <name>mapreduce.jobhistory.webapp.address</name>

         <value>hadoop01:19888</value>

     </property>

    <property>

        <name>mapreduce.application.classpath</name>

        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>

     </property>

</configuration>


# cat /opt/hadoop/etc/hadoop/yarn-site.xml

<?xml version="1.0"?>

<!--

  Licensed under the Apache License, Version 2.0 (the "License");

  you may not use this file except in compliance with the License.

  You may obtain a copy of the License at


    http://www.apache.org/licenses/LICENSE-2.0


  Unless required by applicable law or agreed to in writing, software

  distributed under the License is distributed on an "AS IS" BASIS,

  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

  See the License for the specific language governing permissions and

  limitations under the License. See accompanying LICENSE file.

-->



<configuration>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>hadoop01:8088</value>

<description>配置外网只需要替换外网ip为真实ip,否则默认为 localhost:8088</description>

</property>

<property>

<name>yarn.scheduler.maximum-allocation-mb</name>

<value>2048</value>

<description>每个节点可用内存,单位MB,默认8182MB</description>

</property>

<property>

<name>yarn.nodemanager.vmem-check-enabled</name>

<value>false</value>

<description>忽略虚拟内存的检查,如果你是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。</description>

</property>


<property>

<name>yarn.nodemanager.env-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>

</property>



<property>  

    <name>yarn.resourcemanager.address</name>  

    <value>hadoop01:8032</value>  

</property> 

<property>

    <name>yarn.resourcemanager.scheduler.address</name>  

    <value>hadoop01:8030</value>  

</property>

<property>

    <name>yarn.resourcemanager.resource-tracker.address</name>  

    <value>hadoop01:8031</value>  

</property>



</configuration>


# cat /opt/hadoop/etc/hadoop/workers 

hadoop02

hadoop03

hadoop04



4. 配置启动脚本,添加HDFS和Yarn权限

添加HDFS权限:编辑如下脚本,在第二行空白位置添加HDFS权限

# vi /opt/hadoop/sbin/start-dfs.sh 


HDFS_DATANODE_USER=root

HDFS_DATANODE_SECURE_USER=root

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root


# vi /opt/hadoop/sbin/stop-dfs.sh


HDFS_DATANODE_USER=root

HDFS_DATANODE_SECURE_USER=root

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root


添加Yarn权限:编辑如下脚本,在第二行空白位置添加Yarn权限

# vi /opt/hadoop/sbin/start-yarn.sh 


YARN_RESOURCEMANAGER_USER=root

HDFS_DATANODE_SECURE_USER=root

YARN_NODEMANAGER_USER=root


# vi /opt/hadoop/sbin/stop-yarn.sh


YARN_RESOURCEMANAGER_USER=root

HDFS_DATANODE_SECURE_USER=root

YARN_NODEMANAGER_USER=root


注意:若不添加上述权限,则会报错:缺少用户权限定义所致。


5. 将配置好的文件夹拷贝到其他从节点


# scp -r /opt/hadoop-3.2.0 root@hadoop02:/opt/

# scp -r /opt/hadoop-3.2.0 root@hadoop03:/opt/

# scp -r /opt/hadoop-3.2.0 root@hadoop04:/opt/


# scp -r /opt/hadoop root@hadoop02:/opt/

# scp -r /opt/hadoop root@hadoop03:/opt/

# scp -r /opt/hadoop root@hadoop04:/opt/


6. 初始化 & 启动


#格式化

[root@hadoop01 hadoop-3.2.0]# /opt/hadoop/bin/hdfs namenode -format


#启动

[root@hadoop01 hadoop-3.2.0]# /opt/hadoop/sbin/start-all.sh

Starting namenodes on [hadoop01]

上一次登录:一 10月 12 16:22:06 CST 2020pts/1 上

Starting datanodes

上一次登录:一 10月 12 16:22:32 CST 2020pts/1 上

Starting secondary namenodes [mdw2]

上一次登录:一 10月 12 16:22:34 CST 2020pts/1 上

Starting resourcemanager

上一次登录:一 10月 12 16:22:40 CST 2020pts/1 上

Starting nodemanagers

上一次登录:一 10月 12 16:22:47 CST 2020pts/1 上


7. 验证Hadoop启动成功

#主节点

[root@mdw2 ~]# jps

5089 NameNode

5625 ResourceManager

99770 Jps

5372 SecondaryNameNode


#从节点

# jps

56978 NodeManager

80172 Jps

56862 DataNode


查看Hadoop集群状态

[root@mdw2 ~]# hadoop dfsadmin -report

WARNING: Use of this script to execute dfsadmin is deprecated.

WARNING: Attempting to execute replacement "hdfs dfsadmin" instead.


Configured Capacity: 160982630400 (149.93 GB)

Present Capacity: 131017445376 (122.02 GB)

DFS Remaining: 131017408512 (122.02 GB)

DFS Used: 36864 (36 KB)

DFS Used%: 0.00%

Replicated Blocks:

        Under replicated blocks: 0

        Blocks with corrupt replicas: 0

        Missing blocks: 0

        Missing blocks (with replication factor 1): 0

        Low redundancy blocks with highest priority to recover: 0

        Pending deletion blocks: 0

Erasure Coded Block Groups: 

        Low redundancy block groups: 0

        Block groups with corrupt internal blocks: 0

        Missing block groups: 0

        Low redundancy blocks with highest priority to recover: 0

        Pending deletion blocks: 0


-------------------------------------------------

Live datanodes (3):


Name: 192.168.0.2:9866 (mdw3)

Hostname: mdw3

Decommission Status : Normal

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 12288 (12 KB)

Non DFS Used: 10945437696 (10.19 GB)

DFS Remaining: 42715426816 (39.78 GB)

DFS Used%: 0.00%

DFS Remaining%: 79.60%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Oct 14 13:55:20 CST 2020

Last Block Report: Wed Oct 14 11:53:54 CST 2020

Num of Blocks: 0



Name: 192.168.0.3:9866 (mdw4)

Hostname: mdw4

Decommission Status : Normal

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 12288 (12 KB)

Non DFS Used: 10945388544 (10.19 GB)

DFS Remaining: 42715475968 (39.78 GB)

DFS Used%: 0.00%

DFS Remaining%: 79.60%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Oct 14 13:55:21 CST 2020

Last Block Report: Wed Oct 14 12:57:21 CST 2020

Num of Blocks: 0



Name: 192.168.0.4:9866 (mdw5)

Hostname: mdw5

Decommission Status : Normal

Configured Capacity: 53660876800 (49.98 GB)

DFS Used: 12288 (12 KB)

Non DFS Used: 8074358784 (7.52 GB)

DFS Remaining: 45586505728 (42.46 GB)

DFS Used%: 0.00%

DFS Remaining%: 84.95%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Wed Oct 14 13:55:20 CST 2020

Last Block Report: Wed Oct 14 12:17:55 CST 2020

Num of Blocks: 0



单独启动resourcemanager:


[root@mdw2 hadoop]# yarn-daemon.sh start resourcemanager

WARNING: Use of this script to start YARN daemons is deprecated.

WARNING: Attempting to execute replacement "yarn --daemon start" instead.

[root@mdw2 hadoop]# jps

35411 NameNode

35691 SecondaryNameNode

38558 Jps

38319 ResourceManager


8. Web端口访问

http://192.168.0.1:50070/

http://192.168.0.1:8088/



从节点NodeManager进程启动不起来的解决方案:

NodeManager进程报错信息如下:

INFO org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)


原因:Hadoop集群yarn-site.xml配置错误所致:


默认情况下yarn ResourceManager 相关服务IP地址指向的是0.0.0.0。


而在服务器中,0.0.0.0指的是本机网络地址,那么NodeManager就会在本机找ResourceManager相关服务,而slave节点上并没有这些服务,这些服务在ResourceManager Master主节点上。

所以针对Hadoop集群配置yare-site.xml某些配置项不能使用默认配置。


解决方法:

修改hadoop集群所有节点上yarn-site.xml配置文件,在该文件中配置ResourceManager Master主节点所在地址即可解决问题。详细配置信息如下:

# vi /opt/hadoop/etc/hadoop/yarn-site.xml,在<configuration>和</configuration>之间加入如下配置:


<property>

    <name>yarn.resourcemanager.address</name>

    <value>hadoop01:8032</value>

</property>

<property>

    <name>yarn.resourcemanager.scheduler.address</name>

    <value>hadoop01:8030</value>

</property>

<property>

    <name>yarn.resourcemanager.resource-tracker.address</name>

    <value>hadoop01:8031</value>

</property>


NodeManager进程正常启动后的日志如下:

2020-10-13 14:15:53,762 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@14b030a0{/static,jar:file:/opt/hadoop-3.2.0/share/hadoop/yarn/hadoop-yarn-common-3.2.0.jar!/webapps/static,AVAILABLE}

2020-10-13 14:15:55,165 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.w.WebAppContext@2b5183ec{/,file:///tmp/jetty-0.0.0.0-8042-node-_-any-5774776794028847658.dir/webapp/,AVAILABLE}{/node}

2020-10-13 14:15:55,186 INFO org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@5eb2172{HTTP/1.1,[http/1.1]}{0.0.0.0:8042}

2020-10-13 14:15:55,186 INFO org.eclipse.jetty.server.Server: Started @5011ms

2020-10-13 14:15:55,186 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app node started at 8042

2020-10-13 14:15:55,210 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node ID assigned is : mdw3:24558

2020-10-13 14:15:55,218 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.0.1:8031

2020-10-13 14:15:55,223 INFO org.apache.hadoop.util.JvmPauseMonitor: Starting JVM pause monitor

2020-10-13 14:15:55,323 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Sending out 0 NM container statuses: []

2020-10-13 14:15:55,349 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]

2020-10-13 14:15:55,520 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -955208939

2020-10-13 14:15:55,521 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for container-tokens, got key with id -1467324462

2020-10-13 14:15:55,522 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as mdw3:24558 with total resource of <memory:8192, vCores:8>










来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/15498/viewspace-2726822/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论
交流MySQL,MongoDB和Redis技术。 微信或QQ:410294

注册时间:2015-12-07

  • 博文量
    753
  • 访问量
    2118581