ITPub博客

首页 > 大数据 > 数据分析 > Hbase表export和import

Hbase表export和import

原创 数据分析 作者:javenzhen 时间:2015-09-14 15:06:32 0 删除 编辑
1、查看要做测试的表test

  1. hbase(main):002:0> scan 'test'
  2. ROW COLUMN+CELL
  3.  row-01 column=cf1:id, timestamp=1442020353563, value=1
  4.  row-01 column=cf1:name, timestamp=1442020382276, value=aaa
  5.  row-02 column=cf1:id, timestamp=1442020360143, value=2
  6.  row-02 column=cf1:name, timestamp=1442020388494, value=bbb
  7.  row-03 column=cf1:id, timestamp=1442020364496, value=3
  8.  row-03 column=cf1:name, timestamp=1442020393616, value=ccc
  9.  row-04 column=cf1:id, timestamp=1442020369002, value=4
  10.  row-04 column=cf1:name, timestamp=1442020398557, value=ddd
  11.  row-05 column=cf1:id, timestamp=1442020373493, value=5
  12.  row-05 column=cf1:name, timestamp=1442020404131, value=eee
  13. 5 row(s) in 0.7520 seconds

2、使用Hadoop的export功能导出test表

首先查看JAR表的有哪些帮助选项

  1. grid@master1:~$ hadoop jar /usr/local/hbase/hbase-0.94.27.jar -h
  2. Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/collect/Multimap
  3.         at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:43)
  4.         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  5.         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  6.         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  7.         at java.lang.reflect.Method.invoke(Method.java:606)
  8.         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
  9. Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Multimap
  10.         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  11.         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  12.         at java.security.AccessController.doPrivileged(Native Method)
  13.         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  14.         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  15.         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  16.         ... 6 more
报错java.lang.NoClassDefFoundError: com/google/common/collect/Multimap,找不到Hbase类
  1. grid@master1:~$ cp /usr/local/hbase/lib/guava-11.0.2.jar /usr/local/hadoop/lib
再次执行,还是报错
  1. grid@master1:~$ hadoop jar /usr/local/hbase/hbase-0.94.27.jar -h
  2. Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/zookeeper/KeeperException
  3.         at java.lang.Class.getDeclaredMethods0(Native Method)
  4.         at java.lang.Class.privateGetDeclaredMethods(Class.java:2615)
  5.         at java.lang.Class.getMethod0(Class.java:2856)
  6.         at java.lang.Class.getMethod(Class.java:1668)
  7.         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.<init>(ProgramDriver.java:56)
  8.         at org.apache.hadoop.util.ProgramDriver.addClass(ProgramDriver.java:99)
  9.         at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:47)
  10.         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  11.         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  12.         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  13.         at java.lang.reflect.Method.invoke(Method.java:606)
  14.         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
  15. Caused by: java.lang.ClassNotFoundException: org.apache.zookeeper.KeeperException
  16.         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  17.         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  18.         at java.security.AccessController.doPrivileged(Native Method)
  19.         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  20.         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  21.         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  22.         ... 12 more
还是有报错,但是报错的信息与之前不一样了,说明复制的JAR包是正确的,这次是说找不到Zookeeper
  1. grid@master1:~$ cp /usr/local/zookeeper/zookeeper-3.4.5.jar /usr/local/hadoop/lib
复制Zookeeper的JAR包到Hadoop的lib目录中,再次执行
  1. grid@master1:~$ hadoop jar /usr/local/hbase/hbase-0.94.27.jar -h
  2. Unknown program '-h' chosen.
  3. Valid program names are:
  4.   CellCounter: Count cells in HBase table
  5.   completebulkload: Complete a bulk data load.
  6.   copytable: Export a table from local cluster to peer cluster
  7.   export: Write table data to HDFS.
  8.   import: Import data written by Export.
  9.   importtsv: Import data in TSV format.
  10.   rowcounter: Count rows in HBase table
  11.   verifyrep: Compare the data from tables in two different clusters. WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is changed after being appended to the log.
3、正式执行Hbase export表的操作
  1. grid@master1:~$ hadoop jar /usr/local/hbase/hbase-0.94.27.jar export -D mapred.output.compress=true -D mapred.output.compression.condec=org.apache.hadoop.io.compress.BZip2Codec -D mapred.output.compression.type=BLOCK test /backup/test 2147483647
  2. 15/09/13 01:35:49 INFO mapreduce.Export: versions=2147483647, starttime=0, endtime=9223372036854775807, keepDeletedCells=false
  3. Exception in thread "main" java.lang.reflect.InvocationTargetException
  4.         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  5.         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  6.         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  7.         at java.lang.reflect.Method.invoke(Method.java:606)
  8.         at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:51)
  9.         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  10.         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  11.         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  12.         at java.lang.reflect.Method.invoke(Method.java:606)
  13.         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
  14. Caused by: java.lang.NoClassDefFoundError: com/google/protobuf/Message
  15.         at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addHBaseDependencyJars(TableMapReduceUtil.java:657)
  16.         at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:694)
  17.         at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:151)
  18.         at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:228)
  19.         at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:94)
  20.         at org.apache.hadoop.hbase.mapreduce.Export.createSubmittableJob(Export.java:95)
  21.         at org.apache.hadoop.hbase.mapreduce.Export.main(Export.java:188)
  22.         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  23.         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  24.         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  25.         at java.lang.reflect.Method.invoke(Method.java:606)
  26.         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
  27.         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
  28.         ... 10 more
  29. Caused by: java.lang.ClassNotFoundException: com.google.protobuf.Message
  30.         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  31.         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  32.         at java.security.AccessController.doPrivileged(Native Method)
  33.         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  34.         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  35.         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  36.         ... 23 more
还是报错,真是麻烦啊,这次是protobuf包
  1. grid@master1:~$ cp /usr/local/hbase/lib/protobuf-java-2.4.0a.jar /usr/local/hadoop/lib
复制缺少JAR包protobuf,再次执行export
  1. grid@master1:~$ hadoop jar /usr/local/hbase/hbase-0.94.27.jar export -D mapred.output.compress=true -D mapred.output.compression.condec=org.apache.hadoop.io.compress.BZip2Codec -D mapred.output.compression.type=BLOCK test /backup/test 2147483647
  2. .....................................................
  3. 15/09/13 01:48:57 INFO mapred.JobClient: Running job: job_201509122338_0002
  4. 15/09/13 01:48:58 INFO mapred.JobClient: map 0% reduce 0%
  5. 15/09/13 01:49:18 INFO mapred.JobClient: map 100% reduce 0%
  6. 15/09/13 01:49:23 INFO mapred.JobClient: Job complete: job_201509122338_0002
  7. 15/09/13 01:49:23 INFO mapred.JobClient: Counters: 29
  8. .....................................................

查看MR的UI,可以看到有一个job,是执行了export_test操作

4、为import操作做准备,truncate表test,此操作,是先disable表,在drop表,最后create表

  1. hbase(main):004:0> truncate 'test'
  2. Truncating 'test' table (it may take a while):
  3.  - Disabling table...
  4.  - Dropping table...
  5.  - Creating table...
  6. 0 row(s) in 3.5160 seconds
  7.   
  8. hbase(main):005:0> scan 'test'
  9. ROW COLUMN+CELL
  10. 0 row(s) in 0.0270 seconds
5、执行import操作,导入test表
  1. grid@master1:~/sh$ hadoop jar /usr/local/hbase/hbase-0.94.27.jar import test /backup/test
  2. .....................................................................
  3. 15/09/13 02:01:14 INFO mapreduce.TableOutputFormat: Created table instance for test
  4. 15/09/13 02:01:14 INFO input.FileInputFormat: Total input paths to process : 1
  5. 15/09/13 02:01:15 INFO mapred.JobClient: Running job: job_201509122338_0003
  6. 15/09/13 02:01:16 INFO mapred.JobClient: map 0% reduce 0%
  7. 15/09/13 02:01:36 INFO mapred.JobClient: map 100% reduce 0%
  8. 15/09/13 02:01:41 INFO mapred.JobClient: Job complete: job_201509122338_0003
  9. ..................................................................

UI中多了一个import_test的job


6、查看test测试表,记录都找回来了

  1. hbase(main):006:0> scan 'test'
  2. ROW COLUMN+CELL
  3.  row-01 column=cf1:id, timestamp=1442020353563, value=1
  4.  row-01 column=cf1:name, timestamp=1442020382276, value=aaa
  5.  row-02 column=cf1:id, timestamp=1442020360143, value=2
  6.  row-02 column=cf1:name, timestamp=1442020388494, value=bbb
  7.  row-03 column=cf1:id, timestamp=1442020364496, value=3
  8.  row-03 column=cf1:name, timestamp=1442020393616, value=ccc
  9.  row-04 column=cf1:id, timestamp=1442020369002, value=4
  10.  row-04 column=cf1:name, timestamp=1442020398557, value=ddd
  11.  row-05 column=cf1:id, timestamp=1442020373493, value=5
  12.  row-05 column=cf1:name, timestamp=1442020404131, value=eee
  13. 5 row(s) in 0.0680 seconds


来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/12219480/viewspace-1799128/,如需转载,请注明出处,否则将追究法律责任。

请登录后发表评论 登录
全部评论

注册时间:2011-01-27

  • 博文量
    41
  • 访问量
    89768