失眠网 > 爬取《雪中悍刀行》小说txt

爬取《雪中悍刀行》小说txt

时间：2024-04-02 06:51:41

相关推荐

爬取《雪中悍刀行》小说txt

电视剧日更一集看不够，那就爬原著看看

主要使用requests和BeautifulSoup模块

import requestsfrom bs4 import BeautifulSoup

准备工作，解析网页小说第一章

url = '/chapter/189169/3431546.html'response = requests.get(url)response.encoding = 'utf-8'html = response.textsoup = BeautifulSoup(html)

尝试提取第一章的标题和内容

div = soup.find('div',itemprop="acticleBody")content = div.get_text()div = soup.find('div',"title_txtbox")title = div.get_text()print(title)print(content)

获取所有章节的目录并储存在list中，其中目录网址为

response = requests.get('/showchapter/189169.html')response.encoding = 'utf-8'html = response.textsoup = BeautifulSoup(html)list = soup.find('div',"volume-list");list = list.find_all('div');list = list[3].find_all('li');

循环目录并写入txt

for i in range(0,160):url = list[i].a['href']response = requests.get(url)response.encoding = 'utf-8'html = response.text;soup = BeautifulSoup(html);div = soup.find('div',itemprop="acticleBody");content = div.get_text();div = soup.find('div',"title_txtbox");title = div.get_text();with open('雪中悍刀行.txt','a+',encoding='utf-8') as f:f.write(title)f.write(content)f.write('\n')print("已写入"+title)