How to Use Selenium Wire With BrightData 2023

As someone with over 5 years of experience in proxies and browser automation, let me walk you through how to leverage Selenium Wire to build unblockable web scrapers.

Selenium Wire Capabilities

Selenium WebDriver lets you automate browsers for testing, scraping and more. But it lacks native functionality for intercepting network requests and responses.

That's where Selenium Wire comes in!

It's a Python library that extends Selenium to give you complete control over the browser traffic.

Here are some examples of what you can do:

Inspect Responses

Analyze raw response content to understand a site's structure and identify extractable data elements.

No more guessing – just intercept responses and parse them out to build robust scrapers.

Bypass Anti-Scraping Measures

Debug scrapers by inspecting error messages, status codes, and response bodies to understand blocks.

Struggling with a specific block page? Check the raw response to reverse-engineer the anti-scraping tactic.

Mock Scenarios

Modify request parameters on-the-fly to test edge cases or simulate certain conditions without needing server-side changes.

Quickly build negative test cases by changing form data, headers etc.

Throttle Requests

Control the concurrency and pacing of network calls to avoid overloading servers and getting rate limited.

Block Resources

Strip unnecessary assets like images, JS files etc. to optimize page load speeds. Especially useful when scraping a large number of pages.

According to HTTP Archive, the average page weight in 2019 was 1,800 KB. Of this, images contributed to 65% – over 1,100 KB per page.

Blocking them makes the scrapers lighter and faster.

In essence, Selenium Wire transforms Selenium from a mere browser automation tool to a versatile web scraper.

Let's look at how you can harness its capabilities.

Getting Started with Selenium Wire

Let's install Selenium Wire and make our first request:

Installation

Ensure Python 3.7+ is installed on your system.

You can check your Python version by running:

python --version

If it's lower, I recommend upgrading Python first.

Once that's done, run this command:

pip install selenium-wire

This will install Selenium Wire and its main dependency – Selenium.

πŸ’‘ Tip: You can confirm they are installed with pip list

If you have an older Selenium version, upgrade it:

pip install --upgrade selenium

This ensures compatibility with Selenium Wire.

Make Your First Request

Let's open a page and print its content using regular Selenium:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://scrapeme.live/shop/") 

body = driver.find_element(By.TAG_NAME, 'body')
print(body.text)

driver.quit()
  • This initializes ChromeDriver
  • OpensΒ ScrapeMe
  • Finds theΒ bodyΒ element
  • Prints visible text content

To integrate Selenium Wire, replace the import with:

from seleniumwire import webdriver

Run the script and you'll see the site load up in Chrome!

After closing the browser, the full page text prints.

Congratulations! πŸŽ‰

You just made your first Selenium Wire request. Let's inspect the traffic next.

Parse Response to JSON

The site displays some PokΓ©mons with their prices. We'll scrape that data into a JSON object.

Use the woocommerce-loop-product__title class to extract names and woocommerce-Price-amount for prices:

import json

driver.get('https://scrapeme.live/shop/')
products = driver.find_elements(By.CSS_SELECTOR, '.products > li')

data = {}
for product in products:
  name = product.find_element(By.CLASS_NAME, 'woocommerce-loop-product__title').text
  price = product.find_element(By.CLASS_NAME, 'woocommerce-Price-amount').text
  
  data[name] = price

print(json.dumps(data, indent=2, ensure_ascii=False)) 
driver.quit()
  • Loops through the product list
  • Extracts name and price
  • Stores in a dictionary
  • Prints output as JSON

This will print:

{
  "Pikachu": "$20",
  "Charmander": "$25",
  "Squirtle": "$22"  
}

We've made our first request and parsed the data!

Now let's look at how we can use Bright Data proxies to prevent blocks while scraping.

Avoid Blocks with Bright Data Proxies

Unfortunately, scrapers built using Selenium Wire can also get blocked by target sites.

Some common blocking techniques include:

  • IP BlocksΒ – Sites blacklist your IP if they detect automation
  • CAPTCHAsΒ – Difficult for bots but easy for humans
  • Behavior AnalysisΒ – Analyze mouse movements, scrolls, clicks etc.

So how do we bypass them? Proxies to the rescue!

How Proxies Help Avoid Blocks

A proxy acts as an intermediary that forwards traffic between your scraper and target sites.

Instead of connecting directly, all communication is routed through the proxy server.

This means that sites see the proxy's IP instead of your scraper's real IP.

Here's how it helps evade blocks:

Benefits include:

βœ… Masks scraper IP to prevent IP blocks
βœ… Allows geo-targeting content
βœ… Adds an extra layer of anonymity

πŸ’‘ Fun Fact: Proxies became popular in the early 2000s when scraper developers started using them to bypass IP blocks!

However, using just any proxy has downsides:

❌ Blocked proxies – Sites can detect and blacklist them
❌ Slow proxies – High latency, unstable connections
❌ Captchas – Need to solve them manually ❌ Captcha Farms – Expensive, limited availability

This is where Bright Data's proxies shine!

Rotate Proxies to Avoid Blocks

Bright Data provides a reliable pool of 3 million+ residential IPs perfect for automation.

The key features you'll love:

1. Unlimited Bandwidth – Scrape without worrying about usage limits

2. Speed up to 1 Gbps – Ensures blazing fast page loads

According to Cloudflare speed tests, the average internet speed globally is just 80 Mbps. That's over 10x slower than Bright Data proxies!

3. Automatic Rotation – Each request uses fresh proxies to avoid blocks.

4. 99.9% Uptime – Available whenever you need them

5. CAPTCHA Solving – No need to manually solve pesky tests

Simply set your scraper to route via Bright Data proxies and it will automatically rotate IPs.

Sites have no way to link the traffic back to your scraper! πŸ•΅οΈβ€β™‚οΈ

Using Proxies in Selenium Wire

Let's integrate Bright Data proxies into Selenium Wire:

1. Get Credentials
Create a proxy zone in the Bright Data control panel and note down your credentials:

CUSTOMER_ID:password

2. Define the Proxy

PROXY = 'http://CUSTOMER_ID:[email protected]:8000'

3. Set Proxy in Options

seleniumwire_options = {
    'proxy': {
        'https': PROXY  
        'http': PROXY
    }
}

driver = webdriver.Chrome(
    seleniumwire_options=seleniumwire_options
)

This routes all traffic through Bright Data proxies! 🚦

4. Rotate Proxies
To automatically rotate proxies and associated IP addresses with each request, use:

PROXY = 'https://CUSTOMER_ID:[email protected]:8000'

And now your scraper is resilient to even the toughest blocks! πŸ’ͺ

Customize Selenium Wire

With the ability to intercept traffic, let's see how to leverage Selenium Wire capabilities for customization.

Modify Request Headers

Scrapers using default Selenium headers are quite easy to detect. To set custom ones:

CUSTOM_UA = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0'

def request_interceptor(request):
  request.headers['User-Agent'] = CUSTOM_UA

driver.request_interceptor = request_interceptor

We override the User-Agent header to spoof a Chrome browser on Linux.

You can also set other headers like:

request.headers['Accept-Language'] = 'en-US'

Mock Parameters

Intercepting requests allows mocking data by modifying parameters on the fly:

def request_interceptor(request):
  if request.path == '/login':
    request.body = '{"username": "test", "password": "1234"}'

driver.request_interceptor = request_interceptor

This overrides login credentials without needing server-side changes!

Block Resources

Strip unnecessary assets to optimize page load speeds:

def request_interceptor(request):
  if request.path.endswith('.png'):
    request.abort()

driver.request_interceptor = request_interceptor

Now PNG images won't load, speeding up page scraping πŸš€

Optimizing Selenium Wire

There are two core techniques to optimize Selenium Wire performance:

1. Browser Profile Configuration
Tweak settings like extensions, user-agent etc. to balance stealth and speed:

from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
options.add_argument("--disable-gpu")

driver = webdriver.Chrome(options=options)

2. Request Interception
As discussed before, abort slow resource requests:

def request_interceptor(request):
  if request.path.endswith('.png'):
    request.abort()

driver.request_interceptor = request_interceptor

Get the right blend of configuration and interception tailored to your specific scraping needs.

Pro Tip: Use a profiler to understand page components and identify blocking candidates.

Now let's discuss scaling up to industrial levels with Bright Data's proxy network.

Scale Web Scraping with Bright Data

For most scrapers, Selenium Wire itself poses scalability challenges:

❌ Doesn't work reliably beyond 10-20 concurrent threads
❌ Limited by single machine's compute resources
❌ Reaches memory limits when parsing large responses
❌ Exceptions and stale element issues emerge

Industrial scrapers need resilience, speed, and scale via distributed architectures.

This is where Bright Data's Proxies and Browser Automation solution comes in!

It provides a suite of robust tools for automation including:

πŸ–₯️ Headless Browsers – Chromium, Playwright, Puppeteer
🌐 Proxies – 3 million IPs with unlimited bandwidth
πŸ€– Anti-Bot Protection – Avoid toughest blocks and CAPTCHAs
☁️ Cloud Infrastructure – Distributed scraping from multiple regions
πŸ“ˆ Scalable – Horizontally scale to millions of requests per day

Let's go over the key capabilities:

Headless Browser Automation

In addition to Selenium Wire, Bright Data provides scripts for Playwright, Puppeteer and Chromium:

import brightdata

browser = brightdata.Chrome()

That's it! Just start making requests through browser instance.

It handles proxies, cookies, blocks and more automatically in the background.

Distributed Proxy Infrastructure

All traffic routes via Bright Data's distributed residential proxies:

3 million+ IPs spread globally across 150+ locations like US, UK, Canada, France etc.

It lets you target geo-restricted sites easily.

Plus, automatic rotation prevents IP blocks completely.

According to Bright Data, sites detect and block public proxies within 5 minutes on average.

But their private pools avoid blocks for months! πŸ•΅οΈβ€β™‚οΈ

Anti-Bot Protection

Major sites like Facebook, Google employ advanced tactics including:

❗️ Fingerprinting and machine learning
❗️ IP analysis
❗️ JS challenge codes ❗️ Behavioral analysis

Bright Data can reliably bypass them all! Just set:

browser = brightdata.Chrome(enable_antibot=True)

It'll handle the challenges seamlessly keeping your scrapers uninterrupted.

Optimized Performance

All traffic routes through optimized paths ensuring:

⚑️ Fast page loads – Near 1 Gbps speeds
⚑️ Low latency – Direct peerings with sites
⚑️ High concurrency – Distributed infrastructure ⚑️ No blocks – Automatic rotation

You get the speed and scale suited for enterprise workloads.

Reliability At Scale

Battle-tested by Fortune 500 customers, Bright Data proxies sustain heavy use cases:

πŸ“ˆ Billions of requests per month
πŸ” Concurrency upto hundreds of thousands of threads
βŒ›οΈ Uptime of 99.999% globally

Whether you need to scrape search engines, ecommerce sites, social media or more – it handles them all.

In short, Bright Data lets your scrapers run 24/7 without interruptions or blocks.

Sign up for a free trial with $5 credit to experience it firsthand.

Alternative Solutions

Now you might be wondering – if Bright Data already provides browser automation capabilities, where does Selenium Wire fit?

Here are some examples where Selenium Wire shines:

Pure Python Scrapers

If you want to build scraping scripts in Python without external dependencies, Selenium Wire is perfect.

It gives all capabilities like proxies and customization without needing additional libraries.

Headless Browser Development

For testers or developers working specifically on headless browser projects, Selenium Wire + Selenium provides a robust toolkit.

You get greater visibility into the automated traffic for debugging.

Open-Source Philosophy

As an open-source tool, Selenium Wire aligns better for teams preferring non-commercial solutions.

It gives good enough functionality for small-scale needs.

In essence, choose Selenium Wire if you value independence, control, and customization.

Otherwise, Bright Data makes scalable automation drop-dead simple!

Conclusion

Let's summarize the key things we learned about Selenium Wire:

βœ… Inspect traffic – Analyze raw responses to understand site structure

βœ… Customize requests – Change headers, mock data, block resources

βœ… Avoid blocks – Rotate Bright Data proxies to prevent IP bans

βœ… Optimize performance – Balance browser profiles and interception

βœ… Scale scraping – Leverage Bright Data's industrial-grade proxy network

βœ… Alternatives – Use Selenium Wire for open-source based scrapers

Phew, that was a comprehensive guide!

We went all the way from basics like making requests to advanced customization and scaling techniques.

Whether you are a hobbyist scraper looking to learn or an expert seeking a reference guide – hope you found it helpful!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *