hive读取hbase数据
简单的例子看这个:https://blog.csdn.net/dominic_tiger/article/details/70237542 和 https://blog.csdn.net/xiewenbo/article/details/24627323
之前创建时报错
FAILED: Error in metadata: java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException org.apache.hadoop.hive.hbase.HBaseSerDe: columns has 3 elements while hbase.columns.mapping has 4 elements (counting the key if implicit))
解决方法:
在创建hive/hbase相关联的表时,hbase表结构默认会有一个字段key,如果没有一个显示的字段\’key\’那么在创建表的进修,会自己创建,这样hive对应的表就会出现问题,所以在hive对应的表里一定要加上key这个字段,为了避免这个问题,在hbase表结构里可以显示的添加\’key\’字段,这样不容易出问题。
CREATE TABLE hperson1(key string,id string, name string,email string) STORED BY \'org.apache.hadoop.hive.hbase.HBaseStorageHandler\' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf0:id,cf1:name,cf2:email") TBLPROPERTIES ("hbase.table.name" = "hbperson1");
或者:
CREATE TABLE hperson1(key string,id string, name string,email string) STORED BY \'org.apache.hadoop.hive.hbase.HBaseStorageHandler\' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf0:id,cf1:name,cf2:email") TBLPROPERTIES ("hbase.table.name" = "hbperson1");
1.------------------------------------- hive> CREATE EXTERNAL TABLE tm05(key int,ROWKEY string ,BUS_NO int,CITY_NO int,DEPT_NO int,DOWN_PASSENGER int,FILA_NO int,GROUP_NO int, INS_TIME string, IS_UP_DOWN int ,LABEL_NO int ,LINE_NO string ,MACH_NO int,SITE_TIME string ,UP_PASSENGER int) > STORED BY \'org.apache.hadoop.hive.hbase.HBaseStorageHandler\' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:ROWKEY,cf1:BUS_NO,cf1:CITY_NO,cf1:DEPT_NO,cf1:DOWN_PASSENGER,cf1:FILA_NO,cf1:GROUP_NO,
cf1:INS_TIME,cf1:IS_UP_DOWN,cf1:LABEL_NO,cf1:LINE_NO,cf1:MACH_NO,cf1:SITE_TIME,cf1:UP_PASSENGER") > TBLPROPERTIES ("hbase.table.name" = "tm_temp05", "hbase.mapred.output.outputtable" = "tm_temp05"); 2.-------------------------------------------------------------------------------------------------------- CREATE EXTERNAL TABLE tm01(key int,ROWKEY string ,BUS_NO int,CITY_NO int,DEPT_NO int,DOWN_PASSENGER int,FILA_NO int,GROUP_NO int, INS_TIME string, IS_UP_DOWN int ,LABEL_NO int ,LINE_NO string ,MACH_NO int,SITE_TIME string ,UP_PASSENGER int) > STORED BY \'org.apache.hadoop.hive.hbase.HBaseStorageHandler\' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:ROWKEY,cf1:BUS_NO,cf1:CITY_NO,cf1:DEPT_NO,cf1:DOWN_PASSENGER,cf1:FILA_NO,cf1:GROUP_NO, cf1:INS_TIME,cf1:IS_UP_DOWN,cf1:LABEL_NO,cf1:LINE_NO,cf1:MACH_NO,cf1:SITE_TIME,cf1:UP_PASSENGER") > TBLPROPERTIES ("hbase.table.name" = "tm_temp05", "hbase.mapred.output.outputtable" = "tm_temp05"); 3.--------------------------------------------------------------------------------------------------------------------- CREATE EXTERNAL TABLE tm02(key string,ROWKEY string ,BUS_NO int,CITY_NO int,DEPT_NO int,DOWN_PASSENGER int,FILA_NO int,GROUP_NO int, INS_TIME string, IS_UP_DOWN int ,LABEL_NO int ,LINE_NO string ,MACH_NO int,SITE_TIME string ,UP_PASSENGER int) > STORED BY \'org.apache.hadoop.hive.hbase.HBaseStorageHandler\' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:ROWKEY,cf1:BUS_NO,cf1:CITY_NO,cf1:DEPT_NO,cf1:DOWN_PASSENGER,cf1:FILA_NO,cf1:GROUP_NO, cf1:INS_TIME,cf1:IS_UP_DOWN,cf1:LABEL_NO,cf1:LINE_NO,cf1:MACH_NO,cf1:SITE_TIME,cf1:UP_PASSENGER") > TBLPROPERTIES ("hbase.table.name" = "tm_temp05", "hbase.mapred.output.outputtable" = "tm_temp05");
方案1和方案2结果都一样
方案3:
总结:在必须要使用一个无用列(不使用无用列key会报错),建议使用方案1或方案2.
上面创建的表在hive里是外部表,数据没有存到hadoop的hdfs上。下面创建将外部表的数据放到hadoop的hdfs上
A.//---创建表 CREATE TABLE mt01(ROWKEY string ,BUS_NO int,CITY_NO int,DEPT_NO int,DOWN_PASSENGER int,FILA_NO int,GROUP_NO int, INS_TIME string, IS_UP_DOWN int ,LABEL_NO int ,LINE_NO string ,MACH_NO int,SITE_TIME string ,UP_PASSENGER int) B.//---将查询的数据放到上面创建表里 INSERT INTO mt01 (ROWKEY,BUS_NO,CITY_NO,DEPT_NO,DOWN_PASSENGER,FILA_NO,GROUP_NO, INS_TIME, IS_UP_DOWN ,LABEL_NO,LINE_NO,MACH_NO,SITE_TIME,UP_PASSENGER) ( select ROWKEY,BUS_NO,CITY_NO,DEPT_NO,DOWN_PASSENGER,FILA_NO,GROUP_NO, INS_TIME, IS_UP_DOWN ,LABEL_NO,LINE_NO,MACH_NO,SITE_TIME,UP_PASSENGER from tm01 )
start-hbase.sh