您的位置:首页 > 科技 > 能源 > Hive的基本操作(创建与修改)

Hive的基本操作(创建与修改)

2024/12/23 15:23:48 来源:https://blog.csdn.net/qq_73339471/article/details/140286688  浏览:    关键词:Hive的基本操作(创建与修改)

必备知识

数据类型

基本类型

类型写法
字符char, varchar, string✔
整数tinyint, smallint, int✔, bigint✔
小数float, double, numeric(m,n), decimal(m,n)✔
布尔值boolean✔
时间date✔, timestamp✔

复杂类型(集合类型)

1、数组:array<T>	面向用户提供的原始数据做结构化映射样例: [] / |156,1778,42,138|	=> 描述同一个维度数据2、键值对:map<K,V>		样例: |LogicJava:88,mysql:89|3、结构体:struct<name1:value1,name2:value2,....>样例: 类json格式【以{}开头结尾,且结构稳定】 => 结构化数据

【创建】表操作

一:hive建表【基本语法】

语法组成

组成一:建表 = 基本格式 + 行格式 + 额外处理
组成二:上传数据
*基本格式
create table if not exists TABLE_NAME(FIELD_NAME DATA_TYPE,FIELD_NAME DATA_TYPE,....
)[comment '描述备注']
*行格式
形式一:row format delimited
1、应用场景:面向文本,非结构化与半结构化数据2、模拟数据:123,张三,16853210211116,true,26238.5,阅读;跑步;唱歌,java:98;mysql:54,province:南京;city:江宁3、案例演示:create table if not exists TABLE_NAME(id int,name string,time bigint,isPartyMember boolean,hobby array<string>,scores map<string,int>,address struct<province:string,city:string>)row format delimitedfields terminated by ','collection items terminated by ';'map keys terminated by ':'lines terminated by '\n'4、讲解:fields terminated by ','				列分隔符【字段: id,name...】collection items terminated by ';'		集合项内部间的分隔符map keys terminated by ':'				键值对[map]分隔符lines terminated by '\n'				行分隔符【默认,一般可以省略】
形式二:row format serde ‘CLASS_PATH’
1、应用场景:面向结构化数据,即:结构清晰的数据2、CLASS_PATH有以下几种选择:选择一:CSV【简单类型】数据呈现:"1","2","Football""2","2","Soccer""3","2","Baseball & Softball"代码:create table if not exists TABLE_NAME(id string,page string,word string)row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'with serdeproperties('separatorChar'=',','quoteChar'='"','escapeChar'='\\')选择二:regex【正则】数据呈现:123,张三,16853210211116,true,26238.5,阅读;跑步;唱歌,java:98;mysql:54,province:南京;city:江宁代码:create table if not exists TABLE_NAME(id int,name string,time bigint,isPartyMember boolean,hobby array<string>,scores map<string,int>,address struct<province:string,city:string>)row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'with serdeproperties('input.regex'='^(//d+),(.*?),(//d+),(true|false),(\\d+\\.?\\d+?)$')选择三:JsonSerDe数据呈现:{"name":"henry","age":22,"gender":"male","phone":"18014499655"}代码:create table if not exists json(name string,age int,gender string,phone string)row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'
*额外处理
1、store【存储】基本语法:stored as '存储格式'存储格式:textfile✔,orc,parquet,sequencefile,...案例:stored as textfile2、tblproperties【表属性】(通用):案例【实际情况具体分析】:tblproperties('skip.header.line.count'='1'			【跳过表头,即:第一行】...)
*上传数据入表
方法一【不建议用】:hdfs dfs -put employee.txt /hive312/warehouse/yb12211.db/inner_table_employee方法二【有校验过程】:✔需知:local :表示数据在虚拟机本地缺少local :表示数据在hdfs上overload :覆盖缺少overload :追加第一种【本地虚拟机】:load data local inpath '/root/file/employee.txt'overwrite into table yb12211.inner_table_employee;第二种【hdfs】:load data inpath '/hive_data/hive_cha01/employee/employee.txt'overwrite into table yb12211.inner_table_employee;方法三【只用于【外部表】】:✔基本格式:location 'hdfs中存放文件的【目录】的路径'			外部挂载

针对性实践操作

案例一:/*1|henry|1.81|1995-03-18|江苏,南京,玄武,北京东路68号|logicjava:88,javaoop:76,mysql:80,ssm:82|beauty,money,joke2|arill|1.59|1996-7-30|安徽,芜湖,南山,西湖东路68号|logicjava:79,javaoop:58,mysql:65,ssm:85|beauty,power,sleeping3|mary|1.72|1995-09-02|山东,青岛,长虹,天山东路68*/drop table if exists students;create table if not exists students(number int,name string,height decimal(3,2),birthday date,house struct<province:string,city:string,district:string,street:string>,scores map<string,int>,hobby array<string>)row format delimitedfields terminated by "|"collection items terminated by ","map keys terminated by ":"stored as textfile;load data inpath '/zhou/students.txt'overwrite into table zhou.students;案例二:/*user_id,auction_id,cat_id,cat1,property,buy_mount,day
786295544,41098319944,50014866,50022520,21458:86755362;13023209:3593274;10984217:21985;122217965:3227750;21477:28695579;22061:30912;122217803:3230095,2,123434123*/drop table if exists sam_mum_baby_trade;create external table if not exists sam_mum_baby_trade(user_id bigint,auction_id bigint,cat_id bigint,cat1 bigint,property map<bigint,bigint>,buy_mount int,day bigint)row format delimitedfields terminated by ","collection items terminated by ";"map keys terminated by ":"stored as textfiletblproperties ('skip.header.line.count'='1');load data inpath '/zhou/sam_mum_baby_trade.csv'into table zhou.sam_mum_baby_trade;案例三:/*"1","2","Football""2","2","Soccer""3","2","Baseball & Softball"*/drop table if exists categories;create table if not exists categories(id string,page string,word string)row format serde 'org.apache.hadoop.hive.serde2.OpenCSVSerde'with serdeproperties('separatorChar'=',','quoteChar'='"','escapeChar'='\\')stored as textfile;load data inpath '/zhou/categories.csv'overwrite into table zhou.categories;select * from categories;案例四:/*{"name":"henry","age":22,"gender":"male","phone":"18014499655"}*///Jsondrop table if exists json;create table if not exists json(name string,age int,gender string,phone string)row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'stored as textfile;load data inpath '/zhou/json.log'overwrite into table zhou.json;案例五:/*125;男;2015-9-7 1:52:22;1521.84883;男;2014-9-18 5:24:42;6391.45652;女;2014-5-4 5:56:45;9603.79*/create external table if not exists test1w(user_id int,user_gender string,order_time timestamp,order_amount decimal(6,2))row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'with serdeproperties('input.regex'='(\\d+);(.*?);(\\d{4}-\\d{1,2}-\\d{1,2} \\d{1,2}:\\d{1,2}:\\d{1,2});(\\d+\.?\\d+?)')stored as textfilelocation '/zhou/test1w';select * from test1w;

二:hive建表【高阶语法】

1:CTAS

【本质】:在原有表的基础上查询并创建新表
基本语法:create table if not exists NEW_TABLE_NAME as select ... from OLD_TABLE_NAME ...
案例:原有的表:hive_ext_regex_test1w语句:create table if not exists hive_ext_test_before2015 asselect * from hive_ext_regex_test1wwhere year(order_time)<=2015;

2:CTE

【本质】:对表进行层层筛选,最终形成新表
基本语法:as with....select...
案例:场景:2015年之前的所有数据 以及 2015年之后男性5个以上订单数或5w以上订单总额的订单数据。原有的表:hive_ext_regex_test1w语句:create table hive_test_before2015_and_male_over5or5w aswithbefore2015 as (select * from hive_ext_regex_test1wwhere year(order_time)<=2015),agg_male_over5or5w as (select user_idfrom hive_ext_regex_test1wwhere year(order_time)>2015 and user_gender='男'group by user_idhaving count(*)>=5 or sum(order_amount)>=50000),male_over5or5w as (select A.*from  hive_ext_regex_test1w Ainner join agg_male_over5or5w Bon year(A.order_time)>2015 and A.user_id=B.user_id)select * from before2015union all							【注意:union all => 将表并在一起且不去重】select * from male_over5or5w;

3:CTL

【本质】:复制原表的表结构
基本语法:create table NEW_TABLE_NAME like OLD_TABLE_NAME;
案例:create table hive_test1w_like like hive_ext_regex_test1w;

【修改】表操作

提前需知

1、查看表字段基本信息:desc 表名;2、查看表字段详细信息:desc formatted 表名;	=> 由此可查看表中可修改的属性3、查看建表流程:show create 表名;

基本语法

alter table TABLE_NAMErename to NEW_NAME;set tblproperties('key'='value')		-- 修改表属性:包括各种分隔符,SerDe,...ser fileformat FORMAT;					-- 修改文件格式change old_name new_name TYPE;			-- 修改字段名column(field_name TYPE)					-- 添加列

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com