Python Requests: How To Retry Failed Requests

The Requests library makes sending HTTP requests simple in Python. But while Requests streamlines the fundamentals, handling errors and retries requires a bit more work. In this guide, you'll learn strategies for retrying failed requests in Python using Requests.

Why Retry Failed Requests?

When building robust web scrapers and crawlers in Python, dealing with errors and failures is essential. Requests may fail for several reasons:

  • Network errors – Connection issues or timeouts.
  • Overloaded servers – Too much traffic causes failures.
  • Intermittent server problems – Bugs, downtime.
  • Rate limiting – Hitting usage thresholds.

Rather than crashing on failures, we want scrapers to retry requests and keep trying. This improves resilience.

Some common errors like rate limiting (429 Too Many Requests) are temporary, so retries with delays can help overcome them. Other errors like 500 may indicate an intermittent server issue where retries could succeed.

Handling errors gracefully with retries is crucial for production-grade web scrapers in Python.

Strategies for Retrying Failed Requests

There are a few key strategies we can use to implement retries in Python:

  • Specify retry conditions – Only retry certain errors, not all.
  • Limit total retries – Prevent infinite retries with a cap.
  • Use delays – Pause between retries to avoid overload.
  • Implement backoff – Gradually increase delay between retries.

Later we'll see Python code examples applying these strategies using the Requests library.

Common Request Failure Errors

When a request fails, the server returns an HTTP status code explaining what happened. Some common errors include:

  • 400 Bad Request – Request is malformed.
  • 403 Forbidden – No permission to access resource.
  • 404 Not Found – Resource does not exist.
  • 429 Too Many Requests – Rate limited.
  • 500 Internal Server Error – Generic server error.
  • 502 Bad Gateway – Invalid response from upstream server.
  • 503 Service Unavailable – Server is temporarily overloaded.

We generally want to retry on 429, 500, 502, and 503 errors since they indicate temporary issues. 403 and 404 errors typically require modifying the request rather than simple retries.

Now let's look at Python code to implement retries on failures.

Retrying Requests in Python

There are two main ways to retry failed requests in Python:

  1. Use the built-in session and adapter.
  2. Write a custom decorator/wrapper.

We'll provide examples of both approaches.

Option 1: Session + Adapter (Recommended)

Requests allows setting up a session with a custom adapter that handles retries for all requests made through the session.

First we define a retry strategy using utilities like Retry and HTTPAdapter from the urllib3 library that Requests is built on:

from requests.adapters import HTTPAdapter
from urllib3.util import Retry

retry_strategy = Retry(
    total=3,
    status_forcelist=[429, 500, 502, 503], 
    backoff_factor=1
)

adapter = HTTPAdapter(max_retries=retry_strategy)

The Retry object defines our retry rules:

  • total – Max retry attempts
  • status_forcelist – List of status codes to retry on
  • backoff_factor – Sleep time increases by this factor each retry

We pass the retry_strategy into a HTTPAdapter which handles the retries.

Next we create a Requests Session and mount the adapter:

import requests

session = requests.Session()
session.mount("https://", adapter)

response = session.get("https://example.com")

Now any requests through this session will retry up to 3 times on 429, 500, 502, and 503 errors. Easy!

Option 2: Custom Decorator

For more control, we can write a custom decorator to handle retrying failed requests:

from time import sleep
from functools import wraps

import requests

RETRIES = 3
DELAY = 1 # Seconds
STATUS_FORCELIST = [429, 500, 502, 503]

def retry(func):

    @wraps(func)
    def wrapped(*args, **kwargs):
        attempts = 0
        
        while attempts < RETRIES:
            try:
                return func(*args, **kwargs)
            except requests.exceptions.RequestException as e:
                if e.response.status_code not in STATUS_FORCELIST:
                    raise e

                print(f"Retrying {func.__name__} in {DELAY}s")
                sleep(DELAY)
                attempts += 1
        return func(*args, **kwargs)
    
    return wrapped

@retry
def make_request(url):
    return requests.get(url)
        
response = make_request("https://example.com")

Here we define a retry decorator that will retry on specified status codes up to 3 times with a 1 second delay between attempts. It can decorate and add retry behavior to any request function.

This is useful for granular control over retry logic that can be applied to multiple functions.

Avoiding Blocks with Retries

While retrying requests helps overcome errors, websites may still block scrapers sending repeated traffic. Getting an error like 403 Forbidden indicates you've been blocked.

To avoid blocks while retrying, it helps to:

  • Use proxies – Rotate different IP addresses.
  • Throttle requests – Add delays between requests.
  • Randomize user-agent/headers – Change request headers each retry.

However, implementing all these tactics can be complex. That's where a service like ZenRows comes in handy.

Instead of building your own scraper infrastructure, ZenRows handles proxies, headers, throttling, and more so you can make requests without getting blocked.

For example:

import requests

url = "https://example.com"

# Use ZenRows API as proxy 
proxy = "http://api_key:[email protected]:8001"  

proxies = {"http": proxy, "https": proxy}

response = requests.get(url, proxies=proxies)

Now your requests route through ZenRows proxies and auto-rotate, avoiding blocks even when retrying failed requests.

Conclusion

Handling errors and retries is vital for reliable web scraping in Python. Use built-in sessions or custom wrappers to implement retry logic when requests fail. Strategies like throttling and proxies help avoid blocks when retrying. Services like ZenRows provide these features out of the box so you can focus on writing scrapers rather than infrastructure.

By gracefully retrying failed requests, your Python scrapers will become resilient to errors and blocks – crucial for maintaining robust crawlers in production.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *