【Python3爬虫】使用云打码识别验证码
本来是学着使用tesserocr来识别验证码的,但是由于tesserocr的识别率不高,还是学了一下使用云打码来识别验证码==
具体步骤如下:
1、首先是注册账号,然后进入这个网址(http://www.yundama.com/apidoc/YDM_SDK.html)选择PythonHTTP示例下载:
2、下载后解压,可以看到有如下几个文件,因为我使用的Python版本是3.5,所以打开YDMHTTPDemo3.x:
3、打开之后修改如下几个部分,用户名和密码就是你的用户名和密码,而appid和appkey需要进入开发者后台查看,第一次使用的时候还需要新建一个软件,才能有appid和appkey:
下图中的软件代码就是appid,通讯密钥就是appkey:
4、把信息都添加进去后运行代码,不出意外会返回一个1007,进入网址http://www.yundama.com/apidoc/YDM_ErrorCode.html查找原因
然后进入用户后台充值就行了,充值完以后再次运行代码,就可以看到识别结果了。
进行完如上步骤之后,我们就可以使用云打码平台来识别验证码了,不过为了使用方便,可以建一个YDMDemo.py,把账号密码等信息写进去,调用的时候只需要传入验证码图片就行了。
- 1 import json
- 2 import time
- 3 import requests
- 4
- 5
- 6 class YDMHttp:
- 7 apiurl = 'http://api.yundama.com/api.php'
- 8 username = ''
- 9 password = ''
- 10 appid = ''
- 11 appkey = ''
- 12
- 13 def __init__(self, username, password, appid, appkey):
- 14 self.username = username
- 15 self.password = password
- 16 self.appid = str(appid)
- 17 self.appkey = appkey
- 18
- 19 def request(self, fields, files=[]):
- 20 response = self.post_url(self.apiurl, fields, files)
- 21 response = json.loads(response)
- 22 return response
- 23
- 24 def balance(self):
- 25 data = {'method': 'balance', 'username': self.username, 'password': self.password, 'appid': self.appid,
- 26 'appkey': self.appkey}
- 27 response = self.request(data)
- 28 if response:
- 29 if response['ret'] and response['ret'] < 0:
- 30 return response['ret']
- 31 else:
- 32 return response['balance']
- 33 else:
- 34 return -9001
- 35
- 36 def login(self):
- 37 data = {'method': 'login', 'username': self.username, 'password': self.password, 'appid': self.appid,
- 38 'appkey': self.appkey}
- 39 response = self.request(data)
- 40 if response:
- 41 if response['ret'] and response['ret'] < 0:
- 42 return response['ret']
- 43 else:
- 44 return response['uid']
- 45 else:
- 46 return -9001
- 47
- 48 def upload(self, filename, codetype, timeout):
- 49 data = {'method': 'upload', 'username': self.username, 'password': self.password, 'appid': self.appid,
- 50 'appkey': self.appkey, 'codetype': str(codetype), 'timeout': str(timeout)}
- 51 file = {'file': filename}
- 52 response = self.request(data, file)
- 53 if response:
- 54 if response['ret'] and response['ret'] < 0:
- 55 return response['ret']
- 56 else:
- 57 return response['cid']
- 58 else:
- 59 return -9001
- 60
- 61 def result(self, cid):
- 62 data = {'method': 'result', 'username': self.username, 'password': self.password, 'appid': self.appid,
- 63 'appkey': self.appkey, 'cid': str(cid)}
- 64 response = self.request(data)
- 65 return response and response['text'] or ''
- 66
- 67 def decode(self, filename, codetype, timeout):
- 68 cid = self.upload(filename, codetype, timeout)
- 69 if cid > 0:
- 70 for i in range(0, timeout):
- 71 result = self.result(cid)
- 72 if result != '':
- 73 return cid, result
- 74 else:
- 75 time.sleep(1)
- 76 return -3003, ''
- 77 else:
- 78 return cid, ''
- 79
- 80 def report(self, cid):
- 81 data = {'method': 'report', 'username': self.username, 'password': self.password, 'appid': self.appid,
- 82 'appkey': self.appkey, 'cid': str(cid), 'flag': '0'}
- 83 response = self.request(data)
- 84 if response:
- 85 return response['ret']
- 86 else:
- 87 return -9001
- 88
- 89 def post_url(self, url, fields, files=[]):
- 90 for key in files:
- 91 files[key] = open(files[key], 'rb')
- 92 res = requests.post(url, files=files, data=fields)
- 93 return res.text
- 94
- 95
- 96 def use_ydm(filename):
- 97 username = '' # 用户名
- 98 password = '' # 密码
- 99 app_id = 1 # 软件ID
- 100 app_key = '' # 软件密钥
- 101 code_type = 1004 # 验证码类型
- 102 timeout = 60 # 超时时间,秒
- 103 yundama = YDMHttp(username, password, app_id, app_key) # 初始化
- 104 balance = yundama.balance() # 查询余额
- 105 print('您的题分余额为{}'.format(balance))
- 106 cid, result = yundama.decode(filename, code_type, timeout) # 开始识别
- 107 print('识别结果为{}'.format(result))
- 108 return result