1、首先定位要爬取热榜的数据
拿到热榜数据的域名,返回数据未json数据
https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc
2、python提取数据
import requests
import pandas as pd
import re
import oshead = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36',
}url = 'https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc'response = requests.get(url, headers=head)
print(response.status_code)
json_data = response.json()
#print(json_data)#空列表用户存储数据
title_list = []
url_list = []
hot_list = []#url_list中url太长,其实不用那么长,域名后面加上lusterId就可访问
for data in json_data['data']:title = data['Title']id = data['ClusterId']hot = data['HotValue']title_list.append(title)url_list.append(f"https://www.toutiao.com/trending/{id}")hot_list.append(hot)#print(f"标题:{title_list}\n地址:{id_list}\n热度值:{hot_list}")
# 把列表数据组装成Dataframe数据
ID = range(1, len(title_list) + 1)
df = pd.DataFrame({'ID': ID,'热榜标题': title_list,'热度值': hot_list,'热榜链接': url_list,}
)#指定文件存储路径
output_path = r'C:\Users\MAG\Desktop\python之路\python基础使用\toutiao.csv'try:df.to_csv(output_path, index=False)print("CSV file saved successfully.")
except Exception as e:print("An error occurred while saving the CSV file:")print(e)
3、查看插入至表格内容
总结:
主要requests先获取数据;
创建列表将提取的数据存入列表中;
在使用pd将数据组装成Dataframe数据;
指定要存储的文件将数据保存。