您的位置:首页 > 文旅 > 美景 > 微信公众号和小程序哪个好_建站公司塔山双喜_网络营销知名企业_网络营销推广策划的步骤是什么

微信公众号和小程序哪个好_建站公司塔山双喜_网络营销知名企业_网络营销推广策划的步骤是什么

2024/12/22 20:39:45 来源:https://blog.csdn.net/tiantang2renjian/article/details/143404460  浏览:    关键词:微信公众号和小程序哪个好_建站公司塔山双喜_网络营销知名企业_网络营销推广策划的步骤是什么
微信公众号和小程序哪个好_建站公司塔山双喜_网络营销知名企业_网络营销推广策划的步骤是什么

一、目的

在数据质量模块,需要对原始数据的准确性进行统计

二、Hive中原有代码

2.1 表结构

--42、数据准确性统计表 dwd_data_accuracy
create  table  if not exists  hurys_db.dwd_data_accuracy(data_type               int        comment '1:转向比,2:统计,3:评价,4:区域,5:过车,6:静态排队,7:动态排队,8:轨迹,9:事件数据,10:事件资源',device_no               string     comment '设备编号',field_name              string     comment '字段名',data_unreasonable_rate  float      comment '数据不合理率',data_null_rate          float      comment '数据空值率'
)
comment '数据准确性统计表'
partitioned by (day string)
stored as orc
;

2.2 SQL代码

insert  overwrite  table  hurys_db.dwd_data_accuracy  partition(day)
selectt1.data_type,t1.device_no,t1.field_name,round((sum(case when t1.field_value is not null then 1 else 0 end)/t2.count_device_all),2)  data_unreasonable_rate,round((sum(case when t1.field_value is null then 1 else 0 end)/t2.count_device_all),2) data_null_rate ,t1.day
from hurys_db.dwd_data_clean_record_queue as t1
left join (selectdevice_no,day,count(device_no) count_device_allfrom hurys_db.ods_queuewhere day='2024-09-04'group by device_no, day) as  t2
on t2.device_no=t1.device_no and t2.day=t1.day
where t2.count_device_all is not null
group by t1.data_type, t1.device_no, t1.field_name, t2.count_device_all, t1.day

三、ClickHouse中现有代码

3.1 表结构

--42、八大类基础数据准确性统计表(长期存储)
create  table  if not exists  hurys_jw.dwd_data_accuracy(data_type               Int32            comment '1:转向比,2:统计,3:评价,4:区域,5:过车,6:静态排队,7:动态排队,8:轨迹,9:事件数据,10:事件资源',device_no               String           comment '设备编号',field_name              String           comment '字段名',data_unreasonable_rate  Decimal(10, 6)   comment '数据不合理率',data_null_rate          Decimal(10, 6)   comment '数据空值率',day                     Date             comment '日期'
)
ENGINE = MergeTree
PARTITION BY day
PRIMARY KEY day
ORDER BY day
SETTINGS index_granularity = 8192;

3.2 SQL代码

--静态排队
selectdata_type,device_no,field_name,round(count_field_unreasonable / count_device_all,6) data_unreasonable_rate,round(count_field_null / count_device_all,6) data_null_rate,cast(day as String) day
from (selectt1.data_type,t1.device_no,t1.field_name,sum(case when field_name is not null  then 1 else 0 end) count_field_unreasonable,sum(case when field_name is null  then 1 else 0 end) count_field_null,t2.count_device_all,t1.day
from hurys_jw.dwd_data_clean_record_queue as t1
left join (selectdevice_no,DATE(create_time) day,count(device_no) count_device_allfrom hurys_jw.ods_queuewhere day='2024-10-22'group by device_no, day) as  t2
on t2.device_no=t1.device_no and t2.day=t1.day
where t2.count_device_all > 0
group by t1.data_type, t1.device_no, t1.field_name, t2.count_device_all, t1.day)
;

3.3 Kettle任务

3.3.1 newtime


3.3.2 替换NULL值


3.3.3 静态排队


select
       data_type,
       device_no,
       field_name,
       round(count_field_unreasonable / count_device_all,6) data_unreasonable_rate,
       round(count_field_null / count_device_all,6) data_null_rate,
       cast(day as String) day
from (select
       t1.data_type,
       t1.device_no,
       t1.field_name,
       sum(case when field_name is not null  then 1 else 0 end) count_field_unreasonable,
       sum(case when field_name is null  then 1 else 0 end) count_field_null,
       t2.count_device_all,
       t1.day
from hurys_jw.dwd_data_clean_record_queue as t1
left join (select
                device_no,
                DATE(create_time) day,
                count(device_no) count_device_all
           from hurys_jw.ods_queue
           where day > ?
           group by device_no, day
          ) as  t2
on t2.device_no=t1.device_no and t2.day=t1.day
where t2.count_device_all > 0
group by t1.data_type, t1.device_no, t1.field_name, t2.count_device_all, t1.day)
;

其他clickhouse输入控件代码类似

3.3.4 字段选择


3.3.5 clickhouse输出


3.3.6 执行任务


3.3.7 海豚调度(1天1次)

搞定!就是Hive中原有SQL语句和ClickHouse现有SQL语句很大不同,改造起来有点烦,尤其碰上管卡!

版权声明:

本网仅为发布的内容提供存储空间,不对发表、转载的内容提供任何形式的保证。凡本网注明“来源:XXX网络”的作品,均转载自其它媒体,著作权归作者所有,商业转载请联系作者获得授权,非商业转载请注明出处。

我们尊重并感谢每一位作者,均已注明文章来源和作者。如因作品内容、版权或其它问题,请及时与我们联系,联系邮箱:809451989@qq.com,投稿邮箱:809451989@qq.com