爬虫.pyspider环境搭建
1、SH中搭建的
1.1、貌似 可以和JDK的切换一样的方式,通过 环境变量PATH的设置来 决定使用 哪个版本的Python。主要就是这2个 path中的值:“C:\Program Files\Python??\Scripts\”、“C:\Program Files\Python??\”
2、流水账:
2.1、开始时 有 Python37x64的环境变量是这样的:
C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;C:\Program Files\Python37\Scripts\;C:\Program Files\Python37\;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\dotnet\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\;C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\;C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\;C:\Program Files\TortoiseSVN\bin;C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio\;C:\Program Files\Git\cmd;C:\Program Files\NVIDIA Corporation\Nsight Compute 2019.3.0\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;G:\NVidia\cuda_win7\bin;D:\Program Files (x86)\MATLAB\R2015b\runtime\win32;D:\Program Files (x86)\MATLAB\R2015b\bin;D:\Program Files (x86)\MATLAB\R2015b\polyspace\bin;C:\Program Files (x86)\MATLAB\MATLAB Runtime\v90\runtime\win32;D:\OpenCV_something\opencv-3.4.6-vc14_vc15\build\x86_zz\release\vc14\bin;D:\Program Files\nodejs\;C:\Program Files\Microsoft SQL Server\120\Tools\Binn\
排一排后是这样:
C:\Program Files (x86)\Common Files\Oracle\Java\javapath C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp C:\Program Files\Python37\Scripts\ C:\Program Files\Python37\ C:\Windows\system32 C:\Windows C:\Windows\System32\Wbem C:\Windows\System32\WindowsPowerShell\v1.0\ C:\Program Files\dotnet\ C:\Program Files\Microsoft SQL Server\130\Tools\Binn\ C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\ C:\Program Files\Microsoft SQL Server\100\Tools\Binn\ C:\Program Files\Microsoft SQL Server\100\DTS\Binn\ C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\ C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\ C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\ C:\Program Files\TortoiseSVN\bin C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\ C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ C:\Program Files (x86)\Microsoft SQL Server\140\DTS\Binn\ C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio\ C:\Program Files\Git\cmd C:\Program Files\NVIDIA Corporation\Nsight Compute 2019.3.0\ C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR G:\NVidia\cuda_win7\bin D:\Program Files (x86)\MATLAB\R2015b\runtime\win32 D:\Program Files (x86)\MATLAB\R2015b\bin D:\Program Files (x86)\MATLAB\R2015b\polyspace\bin C:\Program Files (x86)\MATLAB\MATLAB Runtime\v90\runtime\win32 D:\OpenCV_something\opencv-3.4.6-vc14_vc15\build\x86_zz\release\vc14\bin D:\Program Files\nodejs\ C:\Program Files\Microsoft SQL Server\120\Tools\Binn\
ZC:发现,与 Python相关的 就两个:“C:\Program Files\Python37\Scripts\”、“C:\Program Files\Python37\”
去掉这2个之后的环境变量为:
C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program Files\dotnet\;C:\Program Files\Microsoft SQL Server\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\Tools\Binn\;C:\Program Files\Microsoft SQL Server\100\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\100\Tools\Binn\VSShell\Common7\IDE\;C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\;C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn\;C:\Program Files\TortoiseSVN\bin;C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\DTS\Binn\;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio\;C:\Program Files\Git\cmd;C:\Program Files\NVIDIA Corporation\Nsight Compute 2019.3.0\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;G:\NVidia\cuda_win7\bin;D:\Program Files (x86)\MATLAB\R2015b\runtime\win32;D:\Program Files (x86)\MATLAB\R2015b\bin;D:\Program Files (x86)\MATLAB\R2015b\polyspace\bin;C:\Program Files (x86)\MATLAB\MATLAB Runtime\v90\runtime\win32;D:\OpenCV_something\opencv-3.4.6-vc14_vc15\build\x86_zz\release\vc14\bin;D:\Program Files\nodejs\;C:\Program Files\Microsoft SQL Server\120\Tools\Binn\;
然后安装 python-3.5.4.exe(这个是 32位的安装包)
安装 python-3.5.4.exe 完成后,貌似 没有自动添加 Path,且 没有自动包含 pycurl…
手动在CMD中添加:
set path=%path%;"D:\Python\Python35-32\Scripts\";"D:\Python\Python35-32\"
PS:下面的 whl,都是在 https://www.lfd.uci.edu/~gohlke/pythonlibs/#pycurl 里面下载的
手动安装 pycurl:
pip install pycurl
报错:Command “python setup.py egg_info” failed with error code 10 in C:\Users\ADMINI~1\AppData\Local\Temp\pip-build-mpbd0qzy\pycurl\
查了一下 说要装另外2个东西,但是 也会装不成功,然后下载 pycurl-7.43.0.3-cp27-cp27m-win32.whl(下载地址是https://www.lfd.uci.edu/~gohlke/pythonlibs/#pycurl),然后
参考这个文章(pip安装pycurl报错: Complete output from command python setup.py egg_info_ Please specify –curl-dir=_path_to_built_libcurl – 血染&征袍 – 博客园.html [“https://www.cnblogs.com/xueranzp/p/5010656.html”])
pip install wheel
pip install D:\IDE\Python\pycurl-7.43.0.3-cp27-cp27m-win32.whl
报错:“pycurl-7.43.0.3-cp27-cp27m-win32.whl is not a supported wheel on this platform.” 查了一下,说 Python版本要对应,我这里是 Python3.5.4,∴对应是 pycurl-7.43.0.3-cp35-cp35m-win32.whl
pip install D:\IDE\Python\pycurl-7.43.0.3-cp35-cp35m-win32.whl
pip install pycurl
ZC:pycurl 装好
PS:下面的几个 whl的安装 只执行了 “pip install xxxxx.whl”,并没有执行”pip install xxxxx”(不像pycurl的最后还要”pip install pycurl”)
pip install pyspider 参考:windows 下安装pyspider – 幽篁晓筑 – 博客园.html(https://www.cnblogs.com/woods1815/p/9637856.html)
装不上,下载不了一些依赖项,还是到 https://www.lfd.uci.edu/~gohlke/pythonlibs/#pycurl 去下载
Collecting pyspider Using cached https://files.pythonhosted.org/packages/d0/97/d6062c928f53d899ff2a8538fed11d4d425ba3d 27c96248a2c601c1c9fef/pyspider-0.3.10.tar.gz Collecting Flask>=0.10 (from pyspider) Downloading https://files.pythonhosted.org/packages/9b/93/628509b8d5dc749656a9641f4caf13540e2cdec8 5276964ff8f43bbb1d3b/Flask-1.1.1-py2.py3-none-any.whl (94kB) 86% |███████████████████████████▊ | 81kB 5.4kB/s eta 0:00:03Excep
C:\Users\Administrator>pip install D:\IDE\Python\Flask-1.1.1-py2.py3-none-any.whl Processing d:\ide\python\flask-1.1.1-py2.py3-none-any.whl Collecting Werkzeug>=0.15 (from Flask==1.1.1) Cache entry deserialization failed, entry ignored Downloading https://files.pythonhosted.org/packages/ce/42/3aeda98f96e85fd26180534d36570e4d18108d62 ae36f87694b476b83d6f/Werkzeug-0.16.0-py2.py3-none-any.whl (327kB) 37% |████████████ | 122kB 3.7kB/s eta 0:00:56Exception:
pip install D:\IDE\Python\Werkzeug-0.16.0-py2.py3-none-any.whl
然后 反过来,直到“pip install pyspider”
C:\Users\Administrator>pip install pyspider Collecting pyspider Using cached https://files.pythonhosted.org/packages/d0/97/d6062c928f53d899ff2a8538fed11d4d425ba3d 27c96248a2c601c1c9fef/pyspider-0.3.10.tar.gz Requirement already satisfied: Flask>=0.10 in d:\python\python35-32\lib\site-packages (from pyspider ) Requirement already satisfied: Jinja2>=2.7 in d:\python\python35-32\lib\site-packages (from pyspider ) Collecting chardet>=2.2 (from pyspider) Cache entry deserialization failed, entry ignored Cache entry deserialization failed, entry ignored Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec751 0b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB) 7% |██▌ | 10kB 2.0kB/s eta 0:01:03Exception:
pip install D:\IDE\Python\chardet-3.0.4-py2.py3-none-any.whl
pip install D:\IDE\Python\cssselect-1.1.0-py2.py3-none-any.whl
pyspider all
得到 警告:[W 200108 13:46:23 run:413] phantomjs not found, continue running without it.
得到 报错:
ValueError: Invalid configuration: - Deprecated option \'domaincontroller\': use \'http_authenticator.domain_controller\' instead.
解压 phantomjs-2.1.1-windows.zip 后,得到phantomjs.exe,将 phantomjs.exe 复制到 Python根目录(我这里是 路径”D:\Python\Python35-32\”中)
phantomjs.exe放好之后 还是有报错:
Deprecated option \'domaincontroller\': use \'http_authenticator.domain_controller\' instead
解决:(ValueError_ Invalid configuration_ – Deprecated option domaincontroller_ use http_authenticator_qq_37253540的博客-CSDN博客.html [https://blog.csdn.net/qq_37253540/article/details/88196994])
安装完爬虫框架pyspider之后,使用pyspider all 命令,输入http://localhost:5000运行就出现上述错误 原因是因为WsgiDAV发布了版本 pre-release 3.x。 解决方法如下: 在安装包中找到pyspider的资源包,然后找到webui文件里面的webdav.py文件打开,修改第209行即可。 把 \'domaincontroller\': NeedAuthController(app), 修改为: \'http_authenticator\':{ \'HTTPAuthenticator\':NeedAuthController(app), }, 然后再执行pyspider all就能够通过http://localhost:5000打开页面了。
ZC: 貌似 “pyspider all”能跑起来 全看运气?前几次没跑起来 强制结束了 进程python.exe&pyspider.exe&phantomjs.exe(不知道还有没有别的进程需要手动干掉…),然后重来 貌似也不行 但是没报错,然后 又杀进程重来了几次这种操作 就又OK了…(pyspider的”Dashboard”界面也出来了)
3、HOME 试图使用 Python3.7×86 来安装pyspider,安装都OK,但是 启动时报错:
F:\IDE\Python\pyspider_something>pyspider all Traceback (most recent call last): File "C:\Python\Python37-32\Scripts\pyspider-script.py", line 11, in <module> load_entry_point(\'pyspider==0.3.10\', \'console_scripts\', \'pyspider\')() File "c:\python\python37-32\lib\site-packages\pkg_resources\__init__.py", line 489, in load_entry_point return get_distribution(dist).load_entry_point(group, name) File "c:\python\python37-32\lib\site-packages\pkg_resources\__init__.py", line 2793, in load_entry_point return ep.load() File "c:\python\python37-32\lib\site-packages\pkg_resources\__init__.py", line 2411, in load return self.resolve() File "c:\python\python37-32\lib\site-packages\pkg_resources\__init__.py", line 2417, in resolve module = __import__(self.module_name, fromlist=[\'__name__\'], level=0) File "c:\python\python37-32\lib\site-packages\pyspider\run.py", line 231 async=True, get_object=False, no_input=False): ^ SyntaxError: invalid syntax F:\IDE\Python\pyspider_something>
ZC:度娘了下,说“async
和await
从 python3.7 开始已经加入保留关键字中. 参考: What’s New In Python 3.7, 所以async
不能作为函数的参数名.”
ZC:还是用 Python3.5×86吧…
4、
5、