失眠网 > 爬虫实战学习笔记_4 网络请求urllib3模块：发送GET/POST请求实例+上传文件+IP代理+js

爬虫实战学习笔记_4 网络请求urllib3模块：发送GET/POST请求实例+上传文件+IP代理+js

时间：2018-10-27 07:34:54

1 urllib3模块简介

urllib3是一个第三方的网络请求模块（单独安装该模块），在功能上比Python自带的urllib强大。

1.1了解urllib3

urllib3库功能强大，条理清晰的用于HTTP客户端的python库，提供了很多Python标准库里所没有的重要特性。例如：

线程安全。连接池。客户端SSL/TⅡS验证使用multipart编码上传文件Helpers用于重试请求并处理HTTP重定向.支持gzip和deflate编码支持HTTP和SOCKS代理100%的测试覆盖率

1.1.1 urllib3安装命令

pip install urllib3

2 发送网络请求

2.1 发送Get请求

使用urllib3模块发送网络请求时，首先需要创建PoolManager对象，通过该对象调用request()方法来实现网络请求的发送。

request()方法的语法格式如下。

request(method,url,fields=None,headers=None,**urlopen_kw)

method：必选参数，用于指定请求方式，如GET、POST、PUT等。url：必选参数，用于设置需要请求的URL地址。fields：可选参数，用于设置请求参数。headers：可选参数，用于设置请求头。

2.1.1 发送GET请求实例【并获取响应信息】

import urllib3urllib3.disable_warnings() # 关闭SSL警告url = "/"http = urllib3.PoolManager()get = http.request('GET',url) # 返回一个HTTPResponse对象print(get.status)# 输出 200response_header = get.info() # 获取HTTPResponse对象中的info()获取响应头信息，字典形状，需要用for循环for key in response_header:print(key,":",response_header.get(key))# Accept-Ranges : bytes# Cache-Control : no-cache# Connection : keep-alive# Content-Length : 227# Content-Type : text/html# Date : Mon, 21 Mar 12:12:23 GMT# P3p : CP=" OTI DSP COR IVA OUR IND COM ", CP=" OTI DSP COR IVA OUR IND COM "# Pragma : no-cache# Server : BWS/1.1# Set-Cookie : BD_NOT_HTTPS=1; path=/; Max-Age=300, BIDUPSID=E864BF1D7795F2742A7BC13B95F89493; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=., PSTM=1647864743; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=., BAIDUID=E864BF1D7795F27482D1B67B4F266616:FG=1; max-age=31536000; expires=Tue, 21-Mar-23 12:12:23 GMT; domain=.; path=/; version=1; comment=bd# Strict-Transport-Security : max-age=0# Traceid : 1647864743283252404214760038623219429901# X-Frame-Options : sameorigin# X-Ua-Compatible : IE=Edge,chrome=1

2.1.2 发送POST请求

import urllib3url ="/post"params = {'name':'xiaoli','age':'1'}http = urllib3.PoolManager()post = http.request('POST',url,fields=params,retries=5) # retries重试次数：默认为3print("返回结果：",post.data.decode('utf-8'))print("返回结果(含中文的情况下)：",post.data.decode('unicode_escape'))

2.2 处理服务器返回信息

2.2.1 处理服务器返回的json信息

如果服务器返回了一条JSON信息，而这条信息中只有某条数据为可用数据时，可以先将返JSON数据转换为字典数据，按着直按获取指定键所对应的值即可。

import urllib3import jsonurl ="/post"params = {'name':'xiaoli','age':'1'}http = urllib3.PoolManager()post = http.request('POST',url,fields=params,retries=5) # retries重试次数：默认为3post_json_EN = json.loads(post.data.decode('utf-8'))post_json_CH = json.loads(post.data.decode('unicode_escape')) # 将响应数据转换为字典类型print("获取name对应的数据",post_json_EN.get('form').get('name'))# 获取name对应的数据 xiaoli

2.2.2 处理服务器返回的二进制数据（图片）

import urllib3urllib3.disable_warnings()url = 'https://img-/200123063865.png'http = urllib3.PoolManager()get = http.request('GET',url) # 创建open对象print(get.data)f = open('./p.png','wb+')f.write(get.data) # 写入数据f.close()

2.2.3 设置请求头

import urllib3urllib3.disable_warnings()url = '/'headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}http = urllib3.PoolManager()get = http.request('GET',url,headers=headers)print(get.data.decode('utf-8'))

2.2.4 设置超时

import urllib3 # 导入urllib3模块urllib3.disable_warnings()# 关闭ssl警告baidu_url = '/' # 百度超时请求测试地址python_url = '/' # Python超时请求测试地址http = urllib3.PoolManager() # 创建连接池管理对象try:r = http.request('GET',baidu_url,timeout=0.01)# 发送GET请求，并设置超时时间为0.01秒except Exception as error:print('百度超时：',error)# 百度超时： HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002690D2057F0>, 'Connection to timed out. (connect timeout=0.01)'))http2 = urllib3.PoolManager(timeout=0.1) # 创建连接池管理对象,并设置超时时间为0.1秒try:r = http2.request('GET', python_url) # 发送GET请求except Exception as error:print('Python超时：',error)# Python超时： HTTPSConnectionPool(host='', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002690D21A910>, 'Connection to timed out. (connect timeout=0.1)'))

2.2.5 设置IP代理

import urllib3 # 导入urllib3模块url = "/ip" # 代理IP请求测试地址# 定义火狐浏览器请求头信息headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/0101 Firefox/77.0'}# 创建代理管理对象proxy = urllib3.ProxyManager('http://120.27.110.143:80',headers = headers)r = proxy.request('get',url,timeout=2.0) # 发送请求print(r.data.decode())# 打印返回结果

2.3 上传

2.3.1 上传文本

import urllib3import jsonwith open('./test.txt') as f :# 打开文本文件data = f.read() # 读取文件url = "/post"http = urllib3.PoolManager()post = http.request('POST',url,fields={'filedield':('upload.txt',data)})files = json.loads(post.data.decode('utf-8'))['files'] # 获取上传文件内容print(files) # 打印上传文本信息# {'filedield': '在学习中寻找快乐！'}

2.3.2 上传图片文件

import urllib3with open('p.png','rb') as f :data = f.read()url = "/post"http = urllib3.PoolManager()# 发送上传图片文件请求post = http.request('POST',url,body = data,headers={'Content-Type':'image/jpeg'})print(post.data.decode())

爬虫实战学习笔记_4 网络请求urllib3模块：发送GET/POST请求实例+上传文件+IP代理+json+二进制+超时

如果觉得《爬虫实战学习笔记_4 网络请求urllib3模块：发送GET/POST请求实例+上传文件+IP代理+js》对你有帮助，请点赞、收藏，并留下你的观点哦！

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。