失眠网,内容丰富有趣,生活中的好帮手!
失眠网 > 网络爬虫-python-爬取天涯求职贴

网络爬虫-python-爬取天涯求职贴

时间:2019-01-22 01:37:38

相关推荐

网络爬虫-python-爬取天涯求职贴

使用urllib请求页面,使用BeautifulSoup解析页面,使用xlwt3写入Excel

import urllib.requestfrom bs4 import BeautifulSoupimport timeimport xlwt3from xlrd import open_workbookwExcel=xlwt3.Workbook()sheet1=wExcel.add_sheet('my',cell_overwrite_ok=True)num=0fo=open(r'contents.txt','a',encoding='utf-8')def getconten(url):opener = urllib.request.build_opener()try:content = opener.open(url).read()content2=content.decode('utf-8')except:try:content = opener.open(url).read()content2=content.decode('gbk')except:print('decode fail!')return Nonereturn Nonereturn content2def getdetail(url):opener = urllib.request.build_opener()con=getconten(url)## print(url)if con:soup=BeautifulSoup(con)job=soup.find('div','bbs-content clearfix')if job:jobdetail=job.get_text()return jobdetailelse:return Nonedef getonepage(url):global numopener = urllib.request.build_opener()content=getconten(url)if content:soup=BeautifulSoup(content)for tr in soup.find_all('tr','bg'):oneitem=[]j=0detailurl=tr.td.a['href']detailurl=''+detailurl## print(detailurl)detailcon=getdetail(detailurl)## print(detailcon)for item in tr.strings:item=item.strip()if item:oneitem.append(item)sheet1.write(num,j,item)j=j+1## print(item.strip())sheet1.write(num,j,detailcon)num=num+1## print('one is ok')if __name__=='__main__':mainpage='/list.jsp?item=763&sub=2'getonepage(mainpage)wExcel.save('res0.xls')i=0soup=BeautifulSoup(getconten(mainpage))currentpage=soup.find('div','links').a.find_next_sibling('a')currentpage=''+currentpage['href']nextpage=currentpagewhile i<30:print(nextpage)getonepage(nextpage)print('one page finished!')con=getconten(nextpage)if con:soup=BeautifulSoup(con)currentpage=soup.find('div','links').a.find_next_sibling('a').find_next_sibling('a')nextpage=''+currentpage['href']i=i+1else:breakwExcel.save('res.xls')

如果觉得《网络爬虫-python-爬取天涯求职贴》对你有帮助,请点赞、收藏,并留下你的观点哦!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。