示例代码说明:
在小说网站选定一本小说,将小说每个章节内容存为txt文档,文件标题与小说章节标题一致
import requests from lxml import etree #一本小说链接 Anovellink = 'https://www.hongxiu.com/book/18899519001291804#Catalog' #目录页代码 ContentsPageCode = requests.get(Anovellink).text #目录页 ContentsPage = etree.HTML(ContentsPageCode) href = ContentsPage.xpath('//*[@id="j-catalogWrap"]/div[2]/div/ul/li/a/@href') for link in href:#链接地址linkaddress = 'https://www.hongxiu.com' + link#章节页面代码Chapterpagecode=requests.get(linkaddress).text#章节页面Chapterpage = etree.HTML(Chapterpagecode)#文字列表Literallist =Chapterpage.xpath('//div[@class="ywskythunderfont"]/p/text()')#标题title=Chapterpage.xpath('//h1[@class ="j_chapterName"]/text()')[0]file =open('E:/novelpython/'+title+ '.txt','w',encoding='utf-8')for paragraph in Literallist:file.write(paragraph + '\n')print(title +' Chapter crawling is complete') print('The novel pulling is complete')
结果示例: