大体思路

使用redis作为队列,买了一份蘑菇代理,但是这个代理每5秒可以请求一次,我们将IP请求出来,从redis列表队列的左侧插入,要用的时候再从右侧取出,请求成功证明该IP是可用的,将该代理IP从左侧放回,三次都请求失败则认为该代理IP已经失效

代码如下:

import requests
import json
import redis
import time
r = redis.Redis(host=\'127.0.0.1\', port=6379,db=3)
num = r.llen(\'the_ip\')
print(num)
while True:
if num<5:
ip = requests.get(\'http://piping.mogumiao.com/proxy/api/get_ip_al?appKey=b9bfb84c7ca34fec9f51b3a9dca147e5&count=2&expiryDate=0&format=1\').text
print(ip)
code = json.loads(ip)[\'code\']
if code==\'0\':
msg = json.loads(ip)[\'msg\']
for i in msg:
ip = i[\'ip\']+\':\'+i[\'port\']
print(ip)
r.lpush(\'the_ip\',ip)
num = r.llen(\'the_ip\')
elif code==\'3001\':
"提取频繁,5秒提取一次!"
time.sleep(5)
else:
print(\'调用IP接口错误,错误类型为\'+code)
else:
print(\'IP池已经满了\')
num = r.llen(\'the_ip\')
time.sleep(3)

上面这些代码是保证redis代理IP池里始终有5个左右的代理IP
import requests
import json
import redis
import time
from lxml import etree
r = redis.Redis(host=\'127.0.0.1\', port=6379,db=3)
def get_source(url,header,data=None):
ip = r.rpop(\'the_ip\').decode(\'utf8\')
print(\'提取ip\',ip)
if data==None:
n = 0
while True:
try:
source = requests.get(url,headers=header,proxies={\'http\':ip},timeout=5).content
r.lpush(\'the_ip\',ip)
print(\'请求成功返还IP\',ip)
return source
except:
n+=1
print(\'请求失败\'+str(n)+\'次\')
if n==3:
return get_source(url,header)

else:
source = requests.get(url, headers=header, proxies={\'http\': ip},data=data).content
return source


header = {\'User-Agent\': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36}"
}
while True:
source = get_source(\'http://www.ip111.cn/\',header).decode(\'utf8\')
show = etree.HTML(source).xpath(\'//tr[2]/td[2]/text()\')
print(show)

上面的代理是循环请求查看当前IP的网址,从而看出代理IP的变化。每次请求都是轮着使用代理的,可以是代理用更长时间而不必担心老用一个代理IP被封了

版权声明:本文为mypath原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://www.cnblogs.com/mypath/p/9024674.html