Hive的基本操作（创建与修改）

必备知识

数据类型

基本类型

类型	写法
字符	char, varchar, string✔
整数	tinyint, smallint, int✔, bigint✔
小数	float, double, numeric(m,n), decimal(m,n)✔
布尔值	boolean✔
时间	date✔, timestamp✔

复杂类型(集合类型)

1、数组：array<T>	面向用户提供的原始数据做结构化映射样例： [] / |156,1778,42,138|	=> 描述同一个维度数据2、键值对：map<K,V>		样例： |LogicJava:88,mysql:89|3、结构体：struct<name1:value1,name2:value2,....>样例： 类json格式【以{}开头结尾，且结构稳定】 => 结构化数据

【创建】表操作

一：hive建表【基本语法】

语法组成

组成一：建表 = 基本格式 + 行格式 + 额外处理
组成二：上传数据

*基本格式

create table if not exists TABLE_NAME(FIELD_NAME DATA_TYPE,FIELD_NAME DATA_TYPE,....
)[comment '描述备注']

*行格式

形式一：row format delimited

1、应用场景：面向文本，非结构化与半结构化数据2、模拟数据：123,张三,16853210211116,true,26238.5,阅读;跑步;唱歌,java:98;mysql:54,province:南京;city:江宁3、案例演示：create table if not exists TABLE_NAME(id int,name string,time bigint,isPartyMember boolean,hobby array<string>,scores map<string,int>,address struct<province:string,city:string>)row format delimitedfields terminated by ','collection items terminated by ';'map keys terminated by ':'lines terminated by '\n'4、讲解：fields terminated by ','				列分隔符【字段: id,name...】collection items terminated by ';'		集合项内部间的分隔符map keys terminated by ':'				键值对[map]分隔符lines terminated by '\n'				行分隔符【默认，一般可以省略】

形式二：row format serde ‘CLASS_PATH’

1、应用场景：面向结构化数据，即：结构清晰的数据2、CLASS_PATH有以下几种选择：选择一：CSV【简单类型】数据呈现："1","2","Football""2","2","Soccer""3","2","Baseball & Softball"代码：create table if not exists TABLE_NAME(id string,page string,word string)row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'with serdeproperties('separatorChar'=',','quoteChar'='"','escapeChar'='\\')选择二：regex【正则】数据呈现：123,张三,16853210211116,true,26238.5,阅读;跑步;唱歌,java:98;mysql:54,province:南京;city:江宁代码：create table if not exists TABLE_NAME(id int,name string,time bigint,isPartyMember boolean,hobby array<string>,scores map<string,int>,address struct<province:string,city:string>)row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'with serdeproperties('input.regex'='^(//d+),(.*?),(//d+),(true|false),(\\d+\\.?\\d+?)$')选择三：JsonSerDe数据呈现：{"name":"henry","age":22,"gender":"male","phone":"18014499655"}代码：create table if not exists json(name string,age int,gender string,phone string)row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'

*额外处理

1、store【存储】基本语法：stored as '存储格式'存储格式：textfile✔,orc,parquet,sequencefile,...案例：stored as textfile2、tblproperties【表属性】(通用)：案例【实际情况具体分析】：tblproperties('skip.header.line.count'='1'			【跳过表头，即：第一行】...)

*上传数据入表

方法一【不建议用】：hdfs dfs -put employee.txt /hive312/warehouse/yb12211.db/inner_table_employee方法二【有校验过程】：✔需知：local ：表示数据在虚拟机本地缺少local ：表示数据在hdfs上overload ：覆盖缺少overload ：追加第一种【本地虚拟机】：load data local inpath '/root/file/employee.txt'overwrite into table yb12211.inner_table_employee;第二种【hdfs】：load data inpath '/hive_data/hive_cha01/employee/employee.txt'overwrite into table yb12211.inner_table_employee;方法三【只用于【外部表】】：✔基本格式：location 'hdfs中存放文件的【目录】的路径'			外部挂载

针对性实践操作

案例一：/*1|henry|1.81|1995-03-18|江苏,南京,玄武,北京东路68号|logicjava:88,javaoop:76,mysql:80,ssm:82|beauty,money,joke2|arill|1.59|1996-7-30|安徽,芜湖,南山,西湖东路68号|logicjava:79,javaoop:58,mysql:65,ssm:85|beauty,power,sleeping3|mary|1.72|1995-09-02|山东,青岛,长虹,天山东路68*/drop table if exists students;create table if not exists students(number int,name string,height decimal(3,2),birthday date,house struct<province:string,city:string,district:string,street:string>,scores map<string,int>,hobby array<string>)row format delimitedfields terminated by "|"collection items terminated by ","map keys terminated by ":"stored as textfile;load data inpath '/zhou/students.txt'overwrite into table zhou.students;案例二：/*user_id,auction_id,cat_id,cat1,property,buy_mount,day
786295544,41098319944,50014866,50022520,21458:86755362;13023209:3593274;10984217:21985;122217965:3227750;21477:28695579;22061:30912;122217803:3230095,2,123434123*/drop table if exists sam_mum_baby_trade;create external table if not exists sam_mum_baby_trade(user_id bigint,auction_id bigint,cat_id bigint,cat1 bigint,property map<bigint,bigint>,buy_mount int,day bigint)row format delimitedfields terminated by ","collection items terminated by ";"map keys terminated by ":"stored as textfiletblproperties ('skip.header.line.count'='1');load data inpath '/zhou/sam_mum_baby_trade.csv'into table zhou.sam_mum_baby_trade;案例三:/*"1","2","Football""2","2","Soccer""3","2","Baseball & Softball"*/drop table if exists categories;create table if not exists categories(id string,page string,word string)row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'with serdeproperties('separatorChar'=',','quoteChar'='"','escapeChar'='\\')stored as textfile;load data inpath '/zhou/categories.csv'overwrite into table zhou.categories;select * from categories;案例四：/*{"name":"henry","age":22,"gender":"male","phone":"18014499655"}*///Jsondrop table if exists json;create table if not exists json(name string,age int,gender string,phone string)row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'stored as textfile;load data inpath '/zhou/json.log'overwrite into table zhou.json;案例五：/*125;男;2015-9-7 1:52:22;1521.84883;男;2014-9-18 5:24:42;6391.45652;女;2014-5-4 5:56:45;9603.79*/create external table if not exists test1w(user_id int,user_gender string,order_time timestamp,order_amount decimal(6,2))row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'with serdeproperties('input.regex'='(\\d+);(.*?);(\\d{4}-\\d{1,2}-\\d{1,2} \\d{1,2}:\\d{1,2}:\\d{1,2});(\\d+\.?\\d+?)')stored as textfilelocation '/zhou/test1w';select * from test1w;

二：hive建表【高阶语法】

1：CTAS

【本质】：在原有表的基础上查询并创建新表
基本语法：create table if not exists NEW_TABLE_NAME as select ... from OLD_TABLE_NAME ...
案例：原有的表：hive_ext_regex_test1w语句：create table if not exists hive_ext_test_before2015 asselect * from hive_ext_regex_test1wwhere year(order_time)<=2015;

2：CTE

【本质】：对表进行层层筛选，最终形成新表
基本语法：as with....select...
案例：场景：2015年之前的所有数据 以及 2015年之后男性5个以上订单数或5w以上订单总额的订单数据。原有的表：hive_ext_regex_test1w语句：create table hive_test_before2015_and_male_over5or5w aswithbefore2015 as (select * from hive_ext_regex_test1wwhere year(order_time)<=2015),agg_male_over5or5w as (select user_idfrom hive_ext_regex_test1wwhere year(order_time)>2015 and user_gender='男'group by user_idhaving count(*)>=5 or sum(order_amount)>=50000),male_over5or5w as (select A.*from  hive_ext_regex_test1w Ainner join agg_male_over5or5w Bon year(A.order_time)>2015 and A.user_id=B.user_id)select * from before2015union all							【注意：union all => 将表并在一起且不去重】select * from male_over5or5w;

3：CTL

【本质】：复制原表的表结构
基本语法：create table NEW_TABLE_NAME like OLD_TABLE_NAME;
案例：create table hive_test1w_like like hive_ext_regex_test1w;

【修改】表操作

提前需知

1、查看表字段基本信息：desc 表名;2、查看表字段详细信息：desc formatted 表名;	=> 由此可查看表中可修改的属性3、查看建表流程：show create 表名;

基本语法

alter table TABLE_NAMErename to NEW_NAME;set tblproperties('key'='value')		-- 修改表属性：包括各种分隔符,SerDe,...ser fileformat FORMAT;					-- 修改文件格式change old_name new_name TYPE;			-- 修改字段名column(field_name TYPE)					-- 添加列

Hive的基本操作（创建与修改）

必备知识

数据类型

基本类型

复杂类型(集合类型)

【创建】表操作

一：hive建表【基本语法】

语法组成

*基本格式

*行格式

形式一：row format delimited

形式二：row format serde ‘CLASS_PATH’

*额外处理

*上传数据入表

针对性实践操作

二：hive建表【高阶语法】

1：CTAS

2：CTE

3：CTL

【修改】表操作

提前需知

基本语法

最新新闻

热搜词