ITPub博客

首页 > 大数据 > Hadoop > hadoop伪分布式安装

hadoop伪分布式安装

Hadoop 作者:hpls 时间:2011-07-29 18:56:02 0 删除 编辑

在测试机上按照伪分布式方式安装了hadoop,记录下操作步骤,方便日后查找

 

1、下载并安装 /usr/java/jdk1.6.0_26/
2、下载并安装  openssh-5.5p1.tar.gz 安装至 /usr/local/hdpssh/etc/sshd_conf
3、下载hadoop-0.20.203.0rc1.tar.gz 并解压至 /data3/hadoop-0.20.203.0


软件安装完毕后,进行配置

第一步:SSH配置
下载并安装  openssh-5.5p1.tar.gz 安装至 /usr/local/hdpssh
修改openssh的配置文件/usr/local/hdpssh/etc/sshd_config:
Port 30433                #因本机22端口被通道机占用,因此监听30433新端口
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
Subsystem sftp /usr/local/hdpssh/libexec/sftp-server

启动ssh:
/usr/local/hdpssh/sbin/sshd -f /usr/local/hdpssh/etc/sshd_config


基于空口令创建一个新的SSH密钥。以启动无密码登录。

[root@localhost]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
[root@localhost]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

测试
[root@localhost]# ssh -p 30433 -i ~/.ssh/id_rsa localhost 
Last login: Thu Jul 28 10:41:18 2011 from localhost

[root@localhost]# who
xiaozhen pts/3        2011-07-28 09:41 (***.106.182.***)
root     pts/8        2011-07-28 12:35 (localhost)

SSH配置完成。


第二步 配置hadoop进行伪分布式
[root@localhost]# vi conf/hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.6.0_26/
export JRE_HOME=/usr/java/jdk1.6.0_26/jre
export HADOOP_HEAPSIZE=512
export HADOOP_SSH_OPTS="-p 30433 -i /root/.ssh/id_rsa"


[root@localhost]# vi conf/core-site.xml


fs.default.name
hdfs://localhost:9000


hadoop.tmp.dir
/data3/hadoop/tmp
A base for other temporary directories.


[root@localhost]# vi conf/hdfs-site.xml


dfs.name.dir
/data3/hadoop/filesystem/name
Determines where on the local filesystem the DFS name node should store


dfs.data.dir
/data3/hadoop/data
Determin. If this is a comma-delimited


dfs.replication
1
Default block replicied when the file is created. The default

[root@localhost]# vi conf/mapred-site.xml


mapred.job.tracker
localhost:9001


格式化namenode:
bin/hadoop namenode -format
开启服务:
bin/start-all.sh

查看dfs服务是否正常
bin/hadoop dfs -ls /

通过WEB查看hadoop运行状态:
查看集群状态 http://localhost:50070/dfshealth.jsp
查看JOB状态  http://localhost:50030/jobtracker.jsp


启动时遇到的问题:

1、DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/tmg/testdir/core-site.xml could only be replicated to 0

nodes, instead of 1

 

目前做了2步操作,解决了问题,但是仍有待研究

把safemode置于off状态:hadoop dfsadmin -safemode leave
关闭防火墙: /etc/init.d/iptables stop

 

2、  启动hadoop时   #./start-all.sh报错:
localhost: Unrecognized option: -jvm
localhost: Could not create the Java virtual machine.

root用户启动HADOOP,默认开启-jvm参数,应该去掉

查看hadoop/bin/hadoop 源码:

 if [[ $EUID -eq 0 ]]; then
    HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
 else
    HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
 fi


修改为:
 #if [[ $EUID -eq 0 ]]; then
    HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
 #else
      HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
 #fi
 

第三步,进行示例代码测试

拷贝示例到input目录:

bin/hadoop dfs -put conf input

 

运行实例代码:
bin/hadoop jar hadoop-examples-0.20.203.0.jar grep input output 'dfs[a-z.]+'

11/07/28 14:15:13 INFO mapred.FileInputFormat: Total input paths to process : 15
11/07/28 14:15:13 INFO mapred.JobClient: Running job: job_201107281127_0011
11/07/28 14:15:14 INFO mapred.JobClient:  map 0% reduce 0%
11/07/28 14:15:31 INFO mapred.JobClient:  map 13% reduce 0%
11/07/28 14:15:43 INFO mapred.JobClient:  map 26% reduce 0%
11/07/28 14:15:53 INFO mapred.JobClient:  map 40% reduce 8%
11/07/28 14:16:02 INFO mapred.JobClient:  map 53% reduce 13%
11/07/28 14:16:11 INFO mapred.JobClient:  map 66% reduce 13%
11/07/28 14:16:14 INFO mapred.JobClient:  map 66% reduce 17%
11/07/28 14:16:20 INFO mapred.JobClient:  map 80% reduce 22%


查看结果:
bin/hadoop dfs -ls output

Found 3 items
-rw-r--r--   1 root supergroup          0 2011-07-28 14:17 /user/root/output/_SUCCESS
drwxr-xr-x   - root supergroup          0 2011-07-28 14:16 /user/root/output/_logs
-rw-r--r--   1 root supergroup         82 2011-07-28 14:17 /user/root/output/part-00000

 

下载文件到本地,查看结果:

bin/hadoop dfs -get output/part-00000 my_result_log

 

关闭服务:
bin/stop-all.sh

命令使用手册:
http://hadoop.apache.org/common/docs/r0.18.2/cn/hdfs_shell.html

 

 

 

 

 

<!-- 正文结束 -->

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/8044/viewspace-1120336/,如需转载,请注明出处,否则将追究法律责任。

下一篇: 没有了~
请登录后发表评论 登录
全部评论

注册时间:2008-05-07

最新文章