青少年编程培训机构排名前十_淘宝网店代运营_营业推广名词解释_克州seo整站排名

一、序言

最近对象转了销售岗，她的领导布置了项任务，一周要找500个对标客户的联系电话。看她又上天眼查、企查查、爱企查，还上各种采购平台手动抄采购负责人的信息和中标信息。作为一名平时喜欢爬来爬去的技术人，心里突然冒出一个想法，哥直接写个程序帮你一下把数据给爬下来。

男人的话，就如弦上的箭，正所谓开弓没有回头箭，既然牛逼都吹出来了，熬夜咱也得把数据给倒腾出来。

都说从入门到精通，从爬虫到爬坟，别人都用Python爬，今天咱用Java爬。

二、分析需求

平时在公司同事是产品经理，今天对象成为我的临时产品经理，既然要做产品，咱得先了解需求。

在对象的一波描述和想法输出后，作为身经百战的程序dog瞬间秒懂她要什么数据。首先是咱得知道有哪些招标项目，以及项目的发布时间，如下图：
在这里插入图片描述

除了这些招标项目信息，咱还必须得了解，这个项目的采购价和联系人信息，点击详情既可以看到，如下：
在这里插入图片描述

当然，还有中标后的交易信息，用于分析有哪些竞争对手，如下：
在这里插入图片描述
知道信息在哪后，基本思路出来了：

分页爬取所有招标公告列表，并批量落库。
根据招标公告分页列表里的数据，查询详情信息。
捞出公告详情里采购部门的联系人信息、中标交易信息并落库。

三、找数据分析字段

好家伙，点开页面源码一看，熟悉前端的同学们都知道，这玩意一看就是Vue写的单页面应用。如果是服务端渲染的页面，还得找节点信息，解析html。既然是Vue应用那就好办了，前后端完全分离，直接找数据接口即可。

在这里插入图片描述

接口还是挺容易找的，重点是得找哪个字段对应哪些信息。说实话，这个字段命名是真的离谱，刚开始我还以为是什么特别的单词。仔细一看，这字段命名不就是拼音首字母命名吗？

我估摸着写这些接口的程序dog要么英文是真的不行，要么是偷懒了，不过我估计这个网站也是外包给别人的，当然程序猿和网页一个能跑就完事。
在这里插入图片描述

我们再看看查详情信息的接口，看到这玩意是个GET请求，而且还带了些参数，这个参数一看咱就知道是从上面的那个公告分页列表里捞出来的。接下来就是各种对字段，分析字段格式，还有哪些字段可能为空。

当然在这里采了一个很大的坑，这个详情接口的ggLeiXing字段值和分页列表返回的字段值对不上，也是研究了半天才发现，这家伙用的是分页列表里的ggXingZhi这个字段。

真是程序猿坑程序猿啊，不专业的程序猿更是离谱到家，这里吐槽一下写接口的哥们，为了这个字段我是多熬夜了20分钟。
在这里插入图片描述

四、建个表开爬数据

这里设计两张表，一张是分页数据表，用于保存项目信息和请求详情接口的那些个参数，另外一张就是详情表。

create table `t_purchase_overview`
(`id`           bigint primary key auto_increment comment '主键',`project_no`   varchar(32)  not null default '' comment '项目编号',`project_name` varchar(64)  not null default '' comment '项目名称',`publish_time` datetime     null comment '发布时间',`gg_guid`      varchar(128) not null default '' comment '公告GUID',`gg_type`      tinyint      null     default 0 comment '公告类型',`bd_guid`      varchar(128) not null default '' comment '标的GUID',`guid`         varchar(128) not null default '' comment 'GUID',`data_source`  tinyint null default 0 comment '数据源: 0-"app/home/detail", 1-"app/etl/detail"',`create_time`  datetime     not null default current_timestamp comment '创建时间',`update_time`  datetime     null comment '更新时间'
) comment '采购数据概览(分页信息)';
create index `idx_tpo_project_no` on t_purchase_overview (project_no);create table `t_purchase_bulletin`
(`id`               bigint primary key auto_increment comment '主键',`project_no`       varchar(32)    not null default '' comment '项目编号',`project_name`     varchar(64)    not null default '' comment '项目名称',`publish_time`     datetime       null comment '公告发布时间',`price`            decimal(15, 2) not null default 0 comment '招标价格',`purchase_company` varchar(32)    null comment '采购单位',`company_address`  varchar(128)   null comment '采购单位地址',`contact_name`     varchar(8)     null comment '联系人',`contact_phone`    varchar(36)    null comment '联系电话',`trans_info`       varchar(1024)  null comment '成交信息(JSON)格式',`create_time`      datetime       not null default current_timestamp comment '创建时间',`update_time`      datetime       null comment '更新时间'
) comment '采购公告信息(详情)';
create index `idx_tpb_create_time` on t_purchase_bulletin (create_time);
create index `idx_tpb_publish_time` on t_purchase_bulletin (publish_time);