ITPub博客

首页 > 大数据 > Hadoop > Hive常用的SQL命令操作

Hive常用的SQL命令操作

Hadoop 作者:kjk007 时间:2013-05-10 14:16:00 0 删除 编辑

创建表 hive> CREATE TABLE pokes (foo INT, bar STRING); 创建表并创建索引字段ds hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING); 显示所有表 hive> SHOW TABLES; 按正条件(正则表达式)显示表, hive> SHOW TABLES '.*s'; 表添加一列 hive> ALTER TABLE pokes ADD COLUMNS (new_col INT); 添加一列并增加列字段注释 hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment'); 更改表名 hive> ALTER TABLE events RENAME TO 3koobecaf; 删除列 hive> DROP TABLE pokes; 元数据存储 将文件中的数据加载到表中 hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes; 加载本地数据,同时给定分区信息 hive> LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15'); 加载DFS数据 ,同时给定分区信息 hive> LOAD DATA INPATH '/user/myname/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15'); The above command will load data from an HDFS file/directory to the table. Note that loading data from HDFS will result in moving the file/directory. As a result, the operation is almost instantaneous. SQL 操作 按先件查询 hive> SELECT a.foo FROM invites a WHERE a.ds=''; 将查询数据输出至目录 hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a WHERE a.ds=''; 将查询结果输出至本地目录 hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' SELECT a.* FROM pokes a; 选择所有列到本地目录 hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a; hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a WHERE a.key < 100; hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.* FROM events a; hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites, a.pokes FROM profiles a; hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(1) FROM invites a WHERE a.ds=''; hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar FROM invites a; hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT SUM(a.pc) FROM pc1 a; 将一个表的统计结果插入另一个表中 hive> FROM invites a INSERT OVERWRITE TABLE events SELECT a.bar, count(1) WHERE a.foo > 0 GROUP BY a.bar; hive> INSERT OVERWRITE TABLE events SELECT a.bar, count(1) FROM invites a WHERE a.foo > 0 GROUP BY a.bar; JOIN hive> FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events SELECT t1.bar, t1.foo, t2.foo; 将多表数据插入到同一表中 FROM src INSERT OVERWRITE TABLE dest1 SELECT src.* WHERE src.key < 100 INSERT OVERWRITE TABLE dest2 SELECT src.key, src.value WHERE src.key >= 100 and src.key < 200 INSERT OVERWRITE TABLE dest3 PARTITION(ds='2008-04-08', hr='12') SELECT src.key WHERE src.key >= 200 and src.key < 300 INSERT OVERWRITE LOCAL DIRECTORY '/tmp/dest4.out' SELECT src.value WHERE src.key >= 300; 将文件流直接插入文件 hive> FROM invites a INSERT OVERWRITE TABLE events SELECT TRANSFORM(a.foo, a.bar) AS (oof, rab) USING '/bin/cat' WHERE a.ds > '2008-08-09'; This streams the data in the map phase through the script /bin/cat (like hadoop streaming). Similarly - streaming can be used on the reduce side (please see the Hive Tutorial or examples)

原作者: bigsea

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/22654626/viewspace-1111196/,如需转载,请注明出处,否则将追究法律责任。

上一篇: 没有了~
下一篇: 没有了~
请登录后发表评论 登录
全部评论

注册时间:2009-10-05