自己设计代理IP池

mypath 2021-08-07 原文

大体思路

使用redis作为队列，买了一份蘑菇代理，但是这个代理每5秒可以请求一次，我们将IP请求出来，从redis列表队列的左侧插入，要用的时候再从右侧取出，请求成功证明该IP是可用的，将该代理IP从左侧放回，三次都请求失败则认为该代理IP已经失效

代码如下:

import requests
import json
import redis
import time
r = redis.Redis(host=\'127.0.0.1\', port=6379,db=3)
num = r.llen(\'the_ip\')
print(num)
while True:
    if num<5:
        ip = requests.get(\'http://piping.mogumiao.com/proxy/api/get_ip_al?appKey=b9bfb84c7ca34fec9f51b3a9dca147e5&count=2&expiryDate=0&format=1\').text
        print(ip)
        code = json.loads(ip)[\'code\']
        if code==\'0\':
            msg = json.loads(ip)[\'msg\']
            for i in msg:
                ip = i[\'ip\']+\':\'+i[\'port\']
                print(ip)
                r.lpush(\'the_ip\',ip)
            num = r.llen(\'the_ip\')
        elif code==\'3001\':
            "提取频繁,5秒提取一次!"
            time.sleep(5)
        else:
            print(\'调用IP接口错误,错误类型为\'+code)
    else:
        print(\'IP池已经满了\')
        num = r.llen(\'the_ip\')
        time.sleep(3)

上面这些代码是保证redis代理IP池里始终有5个左右的代理IP

import requests
import json
import redis
import time
from lxml import etree
r = redis.Redis(host=\'127.0.0.1\', port=6379,db=3)
def get_source(url,header,data=None):
    ip = r.rpop(\'the_ip\').decode(\'utf8\')
    print(\'提取ip\',ip)
    if data==None:
        n = 0
        while True:
            try:
                source = requests.get(url,headers=header,proxies={\'http\':ip},timeout=5).content
                r.lpush(\'the_ip\',ip)
                print(\'请求成功返还IP\',ip)
                return source
            except:
                n+=1
                print(\'请求失败\'+str(n)+\'次\')
                if n==3:
                   return get_source(url,header)

    else:
        source = requests.get(url, headers=header, proxies={\'http\': ip},data=data).content
        return source


header = {\'User-Agent\': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36}"
}
while True:
   source = get_source(\'http://www.ip111.cn/\',header).decode(\'utf8\')
   show = etree.HTML(source).xpath(\'//tr[2]/td[2]/text()\')
   print(show)

上面的代理是循环请求查看当前IP的网址，从而看出代理IP的变化。每次请求都是轮着使用代理的，可以是代理用更长时间而不必担心老用一个代理IP被封了