在进行Python爬虫开发时,异常处理是确保程序稳定运行的关键。以下是一些常见的异常处理方法:
-
使用
try-except
块: 在可能抛出异常的代码块中使用try
和except
块来捕获和处理异常。import requests try: response = requests.get('http://example.com') response.raise_for_status() # 如果响应状态码不是200,会抛出HTTPError异常 except requests.exceptions.HTTPError as e: print(f"HTTP Error: {e}") except requests.exceptions.RequestException as e: print(f"Request Exception: {e}") except Exception as e: print(f"Unexpected Error: {e}") else: print("Request successful") # 处理成功的响应
-
使用
logging
模块: 使用logging
模块记录异常信息,以便后续分析和调试。import logging import requests logging.basicConfig(filename='spider.log', level=logging.ERROR) try: response = requests.get('http://example.com') response.raise_for_status() except requests.exceptions.HTTPError as e: logging.error(f"HTTP Error: {e}") except requests.exceptions.RequestException as e: logging.error(f"Request Exception: {e}") except Exception as e: logging.error(f"Unexpected Error: {e}") else: print("Request successful") # 处理成功的响应
-
使用
finally
块:finally
块中的代码无论是否发生异常都会执行,适合用于清理资源。import requests try: response = requests.get('http://example.com') response.raise_for_status() except requests.exceptions.HTTPError as e: print(f"HTTP Error: {e}") except requests.exceptions.RequestException as e: print(f"Request Exception: {e}") except Exception as e: print(f"Unexpected Error: {e}") else: print("Request successful") # 处理成功的响应 finally: print("Request completed")
-
使用
asyncio
和aiohttp
进行异步爬虫: 在异步爬虫中,可以使用try-except
块来捕获和处理异常。import aiohttp import asyncio async def fetch(session, url): try: async with session.get(url) as response: response.raise_for_status() return await response.text() except aiohttp.ClientError as e: print(f"Client Error: {e}") except Exception as e: print(f"Unexpected Error: {e}") async def main(): async with aiohttp.ClientSession() as session: html = await fetch(session, 'http://example.com') print(html) loop = asyncio.get_event_loop() loop.run_until_complete(main())
通过这些方法,可以有效地处理爬虫过程中可能出现的各种异常,确保程序的稳定性和可靠性。