ITPub博客

首页 > 大数据 > Hadoop > wordCount功能实现

wordCount功能实现

Hadoop 作者:潇洒妞妞妞 时间:2015-11-18 20:49:52 0 删除 编辑
&nbsp;&nbsp;&nbsp;&nbsp;说来惭愧,从学习到完成wordCount这个程序,差不多用了三天的时间。一方面是在学习的过程中总是遇见这样那样要处理的事情,二是自己的学习知识和解决问题的能力需要提高。我是跟着吴超老师的视频学的,但是应用的环境不一样,我是在hadoop2.4.0环境中应用的。hadoop2.x和hadoop1.x还是有许多差别的。<br /> &nbsp; &nbsp; 在遇到问题的时候我总是绕老绕去,但最终还是要面对。在以后的学习过程中一定要暗示自己遇到问题时少绕圈子,直面问题。言归正传,谈一下这几天我在学习的过程中遇到的问题吧。<br /> <br /> &nbsp; &nbsp;<strong>1.</strong><strong>Could not locate executable null\bin\winutils.exe in the Hadoop binaries.</strong><strong><br /> &nbsp;&nbsp;</strong>在hadoop2.4.0中确实没有winutils.exe&nbsp;这个文件,我就疯狂的百度,找到了一个解决方法,就是在程序中添加几行代码:<br /> <div> <div class="codeheads"> <p> 点击(<span style="cursor:pointer;color:red;" onclick="code_hide('code603')">此处</span>)折叠或打开 </p> </div> <div id="code603" class="codeText"> <ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"> <li> <span style="color:#000000;"><span style="color:#FF0000;">File</span> workaround <span style="color:#0000CC;">=</span> <span style="color:#0000FF;">new</span> <span style="color:#FF0000;">File</span><span style="color:#0000CC;">(</span><span style="color:#FF00FF;">"."</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </span> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp; <span style="color:#FF0000;">System</span><span style="color:#0000CC;">.</span><span style="color:#FF0000;">getProperties</span><span style="color:#0000CC;">(</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">.</span><span style="color:#FF0000;">put</span><span style="color:#0000CC;">(</span><span style="color:#FF00FF;">"hadoop.home.dir"</span><span style="color:#0000CC;">,</span> workaround<span style="color:#0000CC;">.</span><span style="color:#FF0000;">getAbsolutePath</span><span style="color:#0000CC;">(</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp; <span style="color:#0000FF;">new</span> <span style="color:#FF0000;">File</span><span style="color:#0000CC;">(</span><span style="color:#FF00FF;">"./bin"</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">.</span><span style="color:#FF0000;">mkdirs</span><span style="color:#0000CC;">(</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp; <span style="color:#0000FF;">new</span> <span style="color:#FF0000;">File</span><span style="color:#0000CC;">(</span><span style="color:#FF00FF;">"./bin/winutils.exe"</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">.</span><span style="color:#FF0000;">createNewFile</span><span style="color:#0000CC;">(</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span> </li> </ol> </div> </div> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;伪造出winutils.exe出这个文件,确实,错误没有了。但是真的解决了吗?(刚刚验证了一下,在其他地方都配置好的前提下,把真实存在的winutils.exe删掉,用这行代码代替,程序也是可以完成 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;的,那<span style="white-space:normal;">winutils.exe的作用是什么呢?</span>)<br /> <br /> <strong>&nbsp;&nbsp;&nbsp;2.</strong><strong>hadoop Cannot run program "cygpath": CreateProcess error=2, ?????????<br /> </strong> &nbsp; &nbsp;这是一个搞笑的问题,为什么这么说呢?吴超老师的视频上因为出现文件权限问题,导入了一个程序包,我也跟着导入了。这是搞笑之一,因为我的程序并没有出现权限问题,那我为什么还要导入这个程序包呢,怪我墨守成规。导入之后就一直提示这个问题,然后我就百度,百度上也有一些类似的问题,给出的答案是安装一个cygwin,配置下环境,然后问题就解决了。这是搞笑之二,出现这个问题,我没有思考就直接百度,纠结了太长时间,就想找个最佳答案,网上有步骤,比如装个cygwin,我也没有试试。最后我终于安装了cygwin,配置好环境,然而问题并没有解决,网上说这些做好就成功了额。饶了这么久,我终于想起要看问题提示的代码了,问题代码的核心竟是之前导入的那个程序包中的,这还有什么好说的。我把导入的程序包删除了,果然没了这个提示,但是,又出现了新的问题。<br /> <br /> <strong>&nbsp; 3.</strong><strong>java.lang.ClassCastException: org.apache.hadoop.conf.Configuration cannot be cast to org.apache.hadoop.mapred.JobConf</strong><strong><br /> &nbsp;&nbsp;&nbsp;&nbsp;</strong>&nbsp;前面的问题算是环境问题,这个问题是代码问题,而且正是解决我这些谜团的引子。<br /> &nbsp; 因为在编写wordCount这个程序时有<strong>文件输入</strong>这个过程,而且是必须的过程。老师所讲的视频中是通过这行代码实现的&nbsp;FileInputFormat.setInputPaths(job, IN_PATH)。而我敲出了这样的代码FileInputFormat.setInputPaths((JobConf)conf, IN_PATH); 这两行代码看起来就只有第一个参数不一样,我就认为是hadoop2.x和hadoop1.x中源码的差别,就这样敲了下去,大错特错额。<br /> 解决这个问题,我就把<span style="white-space:normal;">FileInputFormat.setInputPaths((JobConf)conf, IN_PATH)改成了FileInputFormat.setInputPaths(new JobConf(conf), IN_PATH);</span>接着第四个问题出来了<br /> <br /> &nbsp; &nbsp;<strong>4.</strong><strong>&nbsp;java.io.IOException: No input paths specified in job</strong><br /> &nbsp;这明明就是文件输入的代码,为什么提示没有输入呢。这时我想到了是不是我导入的类不对,便抓紧看了视频核对一样,果然,我导入的包不对。正确的应该导入入org.apache.hadoop.mapreduce.lib.input.FileInputFormat这个类,而我却导入了org.apache.hadoop.mapred.FileInputFormat这个类。这两个类的方法名一样误导了我,参数的不同我凭“经验”误认为是版本差异,哎,这是有多坑。<br /> <br /> &nbsp; <strong>&nbsp;5.</strong><strong>org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z&nbsp;</strong><br /> &nbsp; &nbsp; 出现了这个问题,还是百度,有几篇博客上说的<span style="font-family:'Tahoma, Microsoft Yahei, Simsun';"><span style="font-size:14px;line-height:25.2px;background-color:#FFFFFF;">把hadoop.dll和<span style="white-space:normal;">winutils.exe</span>放入到windows下hadoop的bin目录下,配置下环境变量便成功了 。 我便下载了hadoop2.2版本中的这两个文件,重启eclipse,可是并没有成功,又试了几次,还是失败。这时候我在咒怨这些博客,说的啥啊,准都不准。纠结了一会,又下载了个hadoop2.6版本的,重启eclipse,程序竟然运行了,我真是太棒了。<br /> <br /> </span></span>&nbsp; &nbsp;<strong>6.将文件中一行中的每个单词分开<br /> </strong>&nbsp; 运行结果出来后我脸瞬间黑了,并没有实现统计每个单词个数的功能,而是把每一行当做一个单词来处理。我检查了好多遍,越检查越着急。到底为什么呢。原来是分割问题。这是所写的分割单词的代码:String[] split = value.toString().split("\t");但在用vi编辑文件时,两个单词之间的间隔是空格,并不是敲的tab键。将代码改为<span style="white-space:normal;">String[] split = value.toString().split(" ")</span><span style="white-space:normal;">;运行,单词统计数目出来了。<br /> <br /> &nbsp;&nbsp;&nbsp;&nbsp;将所敲的代码记录下来,方面以后的学习。<br /> &nbsp;MyMapper.java<br /> <div> <div class="codeheads"> <p> 点击(<span style="cursor:pointer;color:red;" onclick="code_hide('code386')">此处</span>)折叠或打开 </p> </div> <div id="code386" class="codeText"> <ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"> <li> <span style="color:#000000;"><span style="color:#0000FF;">public</span> <span style="color:#0000FF;">class</span> MyMapper <span style="color:#0000FF;">extends</span> Mapper<span style="color:#0000CC;">&lt;</span>LongWritable<span style="color:#0000CC;">,</span> <span style="color:#FF0000;">Text</span><span style="color:#0000CC;">,</span> <span style="color:#FF0000;">Text</span><span style="color:#0000CC;">,</span> LongWritable<span style="color:#0000CC;">&gt;</span> <span style="color:#0000CC;">{</span><br /> </span> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000FF;">protected</span> <span style="color:#0000FF;">void</span> <span style="color:#FF0000;">map</span><span style="color:#0000CC;">(</span>LongWritable <span style="color:#FF0000;">key</span><span style="color:#0000CC;">,</span> <span style="color:#FF0000;">Text</span> <span style="color:#FF0000;">value</span><span style="color:#0000CC;">,</span> <span style="color:#FF0000;">Context</span> <span style="color:#FF0000;">context</span><span style="color:#0000CC;">)</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000FF;">throws</span> <span style="color:#FF0000;">IOException</span><span style="color:#0000CC;">,</span> <span style="color:#FF0000;">InterruptedException</span> <span style="color:#0000CC;">{</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF0000;">String</span><span style="color:#0000CC;">[</span><span style="color:#0000CC;">]</span> <span style="color:#FF0000;">split</span> <span style="color:#0000CC;">=</span> <span style="color:#FF0000;">value</span><span style="color:#0000CC;">.</span><span style="color:#FF0000;">toString</span><span style="color:#0000CC;">(</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">.</span><span style="color:#FF0000;">split</span><span style="color:#0000CC;">(</span><span style="color:#FF00FF;">" "</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF0000;">System</span><span style="color:#0000CC;">.</span>out<span style="color:#0000CC;">.</span><span style="color:#FF0000;">println</span><span style="color:#0000CC;">(</span><span style="color:#FF00FF;">"the splist size is "</span><span style="color:#0000CC;">+</span><span style="color:#FF0000;">split</span><span style="color:#0000CC;">.</span><span style="color:#FF0000;">length</span> <span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000FF;">for</span> <span style="color:#0000CC;">(</span><span style="color:#FF0000;">String</span> word <span style="color:#0000CC;">:</span> <span style="color:#FF0000;">split</span><span style="color:#0000CC;">)</span> <span style="color:#0000CC;">{</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF0000;">System</span><span style="color:#0000CC;">.</span>out<span style="color:#0000CC;">.</span><span style="color:#FF0000;">println</span><span style="color:#0000CC;">(</span><span style="color:#FF00FF;">"the word is "</span><span style="color:#0000CC;">+</span>word<span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF0000;">context</span><span style="color:#0000CC;">.</span><span style="color:#FF0000;">write</span><span style="color:#0000CC;">(</span><span style="color:#0000FF;">new</span> <span style="color:#FF0000;">Text</span><span style="color:#0000CC;">(</span>word<span style="color:#0000CC;">)</span><span style="color:#0000CC;">,</span> <span style="color:#0000FF;">new</span> LongWritable<span style="color:#0000CC;">(</span>1L<span style="color:#0000CC;">)</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000CC;">}</span><br /> </li> <li> <br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000CC;">}</span><br /> </li> <li> <br /> </li> <li> <span style="color:#0000CC;">}</span> </li> </ol> </div> </div> MyReaducer.java<br /> <br /> <div> <div class="codeheads"> <p> 点击(<span style="cursor:pointer;color:red;" onclick="code_hide('code673')">此处</span>)折叠或打开 </p> </div> <div id="code673" class="codeText"> <ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"> <li> <span style="color:#000000;"><span style="color:#0000FF;">public</span> <span style="color:#0000FF;">class</span> MyReducer <span style="color:#0000FF;">extends</span> Reducer<span style="color:#0000CC;">&lt;</span><span style="color:#FF0000;">Text</span><span style="color:#0000CC;">,</span> LongWritable<span style="color:#0000CC;">,</span> <span style="color:#FF0000;">Text</span><span style="color:#0000CC;">,</span> LongWritable<span style="color:#0000CC;">&gt;</span> <span style="color:#0000CC;">{</span><br /> </span> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000FF;">protected</span> <span style="color:#0000FF;">void</span> reduce<span style="color:#0000CC;">(</span><span style="color:#FF0000;">Text</span> k2<span style="color:#0000CC;">,</span> <span style="color:#FF0000;">Iterable</span><span style="color:#0000CC;">&lt;</span>LongWritable<span style="color:#0000CC;">&gt;</span> v2s<span style="color:#0000CC;">,</span> <span style="color:#FF0000;">Context</span> <span style="color:#FF0000;">context</span><span style="color:#0000CC;">)</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000FF;">throws</span> <span style="color:#FF0000;">IOException</span><span style="color:#0000CC;">,</span> <span style="color:#FF0000;">InterruptedException</span> <span style="color:#0000CC;">{</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000FF;">Long</span> <span style="color:#FF0000;">sum</span> <span style="color:#0000CC;">=</span> 0L<span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000FF;">for</span> <span style="color:#0000CC;">(</span>LongWritable v2 <span style="color:#0000CC;">:</span> v2s<span style="color:#0000CC;">)</span> <span style="color:#0000CC;">{</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF0000;">sum</span><span style="color:#0000CC;">+</span><span style="color:#0000CC;">=</span>v2<span style="color:#0000CC;">.</span><span style="color:#FF0000;">get</span><span style="color:#0000CC;">(</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000CC;">}</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF0000;">context</span><span style="color:#0000CC;">.</span><span style="color:#FF0000;">write</span><span style="color:#0000CC;">(</span> k2<span style="color:#0000CC;">,</span> <span style="color:#0000FF;">new</span> LongWritable<span style="color:#0000CC;">(</span><span style="color:#FF0000;">sum</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000CC;">}</span><br /> </li> <li> <br /> </li> <li> <span style="color:#0000CC;">}</span> </li> </ol> </div> </div> WordCount.java<br /> <br /> <div> <div class="codeheads"> <p> 点击(<span style="cursor:pointer;color:red;" onclick="code_hide('code431')">此处</span>)折叠或打开 </p> </div> <div id="code431" class="codeText"> <ol style="margin:0 1px 0 0px;padding-left:40px;" start="1" class="dp-css"> <li> <span style="color:#000000;"><span style="color:#0000FF;">public</span> <span style="color:#0000FF;">class</span> WordCount <span style="color:#0000CC;">{</span><br /> </span> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000FF;">static</span> <span style="color:#FF0000;">String</span> IN_PATH <span style="color:#0000CC;">=</span> <span style="color:#FF00FF;">"hdfs://192.168.15.100:9000/test"</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000FF;">static</span> <span style="color:#FF0000;">String</span> OUT_PATH <span style="color:#0000CC;">=</span> <span style="color:#FF00FF;">"hdfs://192.168.15.100:9000/test1"</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000FF;">public</span> <span style="color:#0000FF;">static</span> <span style="color:#0000FF;">void</span> main<span style="color:#0000CC;">(</span><span style="color:#FF0000;">String</span><span style="color:#0000CC;">[</span><span style="color:#0000CC;">]</span> args<span style="color:#0000CC;">)</span> <span style="color:#0000FF;">throws</span> <span style="color:#FF0000;">Exception</span> <span style="color:#0000CC;">{</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">// TODO Auto-generated method stub</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="color:#FF9900;">/*File workaround = new File(".");<br /> </span> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp; System.getProperties().put("hadoop.home.dir", workaround.getAbsolutePath());<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp; new File("./bin").mkdirs();<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp; new File("./bin/winutils.exe").createNewFile();*/<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF0000;">Configuration</span> conf <span style="color:#0000CC;">=</span> <span style="color:#0000FF;">new</span> <span style="color:#FF0000;">Configuration</span><span style="color:#0000CC;">(</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">/*JobConf jobConf = new JobConf(conf);*/</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Job job <span style="color:#0000CC;">=</span> <span style="color:#0000FF;">new</span> Job<span style="color:#0000CC;">(</span>conf<span style="color:#0000CC;">,</span>WordCount<span style="color:#0000CC;">.</span><span style="color:#0000FF;">class</span><span style="color:#0000CC;">.</span>getSimpleName<span style="color:#0000CC;">(</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">//文件输入</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF0000;">org</span><span style="color:#0000CC;">.</span>apache<span style="color:#0000CC;">.</span>hadoop<span style="color:#0000CC;">.</span>mapreduce<span style="color:#0000CC;">.</span>lib<span style="color:#0000CC;">.</span>input<span style="color:#0000CC;">.</span>FileInputFormat<span style="color:#0000CC;">.</span>setInputPaths<span style="color:#0000CC;">(</span>job<span style="color:#0000CC;">,</span> IN_PATH<span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">//&nbsp;&nbsp;&nbsp;&nbsp;FileInputFormat.setInputPaths(new JobConf(conf), IN_PATH);</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">//指定对输入数据进行格式化管理的类</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;job<span style="color:#0000CC;">.</span>setInputFormatClass<span style="color:#0000CC;">(</span>TextInputFormat<span style="color:#0000CC;">.</span><span style="color:#0000FF;">class</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">//指定自定义的Maper类</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;job<span style="color:#0000CC;">.</span>setMapperClass<span style="color:#0000CC;">(</span>MyMapper<span style="color:#0000CC;">.</span><span style="color:#0000FF;">class</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">//指定map的key、value输出类型</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;job<span style="color:#0000CC;">.</span>setMapOutputKeyClass<span style="color:#0000CC;">(</span><span style="color:#FF0000;">Text</span><span style="color:#0000CC;">.</span><span style="color:#0000FF;">class</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;job<span style="color:#0000CC;">.</span>setMapOutputValueClass<span style="color:#0000CC;">(</span>LongWritable<span style="color:#0000CC;">.</span><span style="color:#0000FF;">class</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">//分区</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;job<span style="color:#0000CC;">.</span>setPartitionerClass<span style="color:#0000CC;">(</span><span style="color:#FF0000;">org</span><span style="color:#0000CC;">.</span>apache<span style="color:#0000CC;">.</span>hadoop<span style="color:#0000CC;">.</span>mapreduce<span style="color:#0000CC;">.</span>lib<span style="color:#0000CC;">.</span>partition<span style="color:#0000CC;">.</span>HashPartitioner<span style="color:#0000CC;">.</span><span style="color:#0000FF;">class</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;job<span style="color:#0000CC;">.</span>setNumReduceTasks<span style="color:#0000CC;">(</span>1<span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">//指定自定义的reducer类</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;job<span style="color:#0000CC;">.</span>setReducerClass<span style="color:#0000CC;">(</span>MyReducer<span style="color:#0000CC;">.</span><span style="color:#0000FF;">class</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">//指定reducer的输出类型</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;job<span style="color:#0000CC;">.</span>setOutputKeyClass<span style="color:#0000CC;">(</span><span style="color:#FF0000;">Text</span><span style="color:#0000CC;">.</span><span style="color:#0000FF;">class</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;job<span style="color:#0000CC;">.</span>setOutputValueClass<span style="color:#0000CC;">(</span>LongWritable<span style="color:#0000CC;">.</span><span style="color:#0000FF;">class</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">//指定输出的路径</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;FileOutputFormat<span style="color:#0000CC;">.</span>setOutputPath<span style="color:#0000CC;">(</span>job<span style="color:#0000CC;">,</span> <span style="color:#0000FF;">new</span> Path<span style="color:#0000CC;">(</span>OUT_PATH<span style="color:#0000CC;">)</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">//指定输出的格式化类</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;job<span style="color:#0000CC;">.</span>setOutputFormatClass<span style="color:#0000CC;">(</span>TextOutputFormat<span style="color:#0000CC;">.</span><span style="color:#0000FF;">class</span><span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">//提交任务</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;job<span style="color:#0000CC;">.</span>waitForCompletion<span style="color:#0000CC;">(</span>true<span style="color:#0000CC;">)</span><span style="color:#0000CC;">;</span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#FF9900;">// System.exit(job.waitForCompletion(true) ? 0:1); </span><br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br /> </li> <li> <br /> </li> <li> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#0000CC;">}</span> </li> </ol> </div> </div> <br /> <br /> <br /> <br /> <br /> </span>

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/30545764/viewspace-1840809/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2015-10-29

  • 博文量
    44
  • 访问量
    79899