How to Change Selenium User Agents [2023 Guide]

The user agent string identifies your client software like browser, OS, and device to servers. Selenium's default user agents clearly expose automation, often leading to blocks.

We'll cover:

  • Data on the scale of user agent blocking across the web.
  • Examples of real-world browser user agents to mimic.
  • Implementing intelligent user agent rotation at scale.
  • Generating valid user agents from real browser data.
  • Limitations of manual rotation and superior alternatives.
  • Matching other fingerprints like headers for complete cloaking.

The Scale of User Agent Blocking in Web Scraping

With bot mitigation a $7.5+ billion dollar industry projected to surpass $19 billion by 2027 according to Grand View Research, vast sums are invested in blocking scrapers and crawlers based on fingerprints like user agents.

Over 30% of websites now block traffic from common scraping tools and non-organic user agents according to SiteLock. Nearly all mainstream sites analyze user agent strings to discern real users from scrapers.

Without proper user agent management, scrapers suffer blocked IPs, CAPTCHAs, and failed data extraction. Matching real user agents is now mandatory for success.

Next, let's examine examples of real browser user agents to model after.

Examples of Standard Browser User Agents

To appear human, our user agents should match genuine browser versions.

Here are some examples of valid user agents from popular browsers and platforms:

Chrome on Windows 10

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36

Firefox on Windows 10

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0

Chrome on macOS Ventura

Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36

Chrome on Android

Mozilla/5.0 (Linux; Android 10) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Mobile Safari/537.36

Safari on iOS 16

Mozilla/5.0 (iPhone; CPU iPhone OS 16_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.2 Mobile/15E148 Safari/604.1

For robust browser user agent lists, refer to resources like WhatIsMyBrowser.

Now let's examine how to configure Selenium to use these real user agents.

Setting a Custom User Agent with Selenium in Python

To override Selenium's default user agent, we can pass a custom string during browser initialization.

For Chrome:

from selenium import webdriver 

options = webdriver.ChromeOptions()
options.headless = True
options.add_argument('user-agent=MyCustomUserAgent')

driver = webdriver.Chrome(options=options)

For Firefox:

from selenium import webdriver

options = webdriver.FirefoxOptions()
options.set_preference('general.useragent.override', 'MyCustomUserAgent')

driver = webdriver.Firefox(options=options)

We simply pass our desired user agent value during driver configuration.

Now you can make Selenium use any user agent you want!

But intelligently rotating user agents is best practice for web scrapers. Let's examine how to implement that next.

Implementing Intelligent User Agent Rotation

Using the identical user agent for all requests creates an easy pattern for defenses to detect. The key is adding variance.

Here's one way to implement random user agent rotation in Python:

import random
from selenium import webdriver

agents = [
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0',
  'Mozilla/5.0 (iPhone; CPU iPhone OS 16_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.2 Mobile/15E148 Safari/604.1', 
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36'
]  

for i in range(10):

  user_agent = random.choice(agents) 

  options = webdriver.ChromeOptions()
  options.headless = True
  options.add_argument(f'user-agent={user_agent}')

  driver = webdriver.Chrome(options=options)

  driver.get('https://example.com')
  # Scrape page...
  
  driver.quit()

For each request, we randomly select a user agent from our pool to appear as a different device and browser.

This prevents the suspicious pattern of thousands of identical user agents scraping in succession.

Scaling User Agent Rotation for Large Scrapers

While suitable for smaller scripts, production web scraping requires automating user agent management at scale.

Here are some tips for enterprise-grade user agent rotation:

1. Generate User Agents On-Demand

Rather than a static array, use libraries like FakerJS to generate unlimited user agents on the fly.

2. Draw From Real Visitor Data

Services like ScraperAPI run traffic through real browser farms, providing organic user agents.

3. Analyze Target Site Patterns

Fingerprint real visitor user agents first. Replicate those patterns in your bot.

4. Continuously Refine Based on Metrics

Let success rate guide user agent selection and rules. Adjust constantly.

With the right architecture, you can achieve seamless user agent rotation to scrape confidently at scale.

Next, let's explore ways to build a varied list of valid user agents to draw from.

Generating Diverse User Agent Lists From Real Browsers

To imitate organic users, our user agents should originate from real browsers in the wild.

Here are strategies for building a diverse pool:

Browser Testing Platforms

Tools like BrowserStack and SauceLabs provide access to thousands of real browser VMs with associated user agents.

Open Source Lists

Projects like FakerJS aggregate user agents from real browser telemetry data.

Commercial Proxy Networks

Luminati, BrightData, and Oxylabs route traffic through millions of residential devices, providing associated user agents.

Traffic Analysis

Inspect network traffic directly to fingerprint user agents from a target site's real visitors.

Automated Validation

Use tools like WhichBrowser and Yauaa to clean and validate scraped user agent lists.

Combining these sources yields the most robust user agent list for organic scraping.

User Agent Management Best Practices

Beyond setting user agents directly, some additional tips include:

Match User Agent with Other Headers

Align Accept, Encoding, Language etc. headers with user agent for consistency.

Iteratively Improve Rotation Rules

Constantly refine based on metrics, blocks encountered, and site changes.

Leverage Browser Farms

Increase volume of human traffic to mask scraping activity.

Use Residential Proxies

Route through residential IPs to create complete browser fingerprints.

Analyze Target Site Directly

Fingerprint real visitor user agents before mimicking.

Investing in thoughtful user agent architecture lowers risk and sustains access.

Limitations of Manual User Agent Rotation

While useful, manually rotating user agents has some notable downsides:

  • Easy to reuse agents predictably without sufficient variance.
  • Generating truly randomized, real data is challenging.
  • Doesn't integrate organically with other fingerprints like geolocation.
  • Still detectable if other headers don't match.
  • Requires constant maintenance and optimization.

For these reasons, many scrapers leverage superior alternatives:

Browser Automation

Tools like Playwright and Puppeteer provide organic user agents by driving real browsers.

Proxy Services

APIs from BrightData, Oxylabs and GeoSurf proxy traffic through residential devices to spoof all fingerprints.

Both options lift the burden of manual user agent management for scraping.

Matching User Agents to Other Fingerprints

To fully replicate human visitors, the user agent must fit with all other fingerprint data:

  • IP Geolocation – User agent location should match proxy geo.
  • Accept Headers – Mime types must align with browser.
  • Cookies – Properly save and send cookies.
  • TLS Fingerprint – Crypto settings align as expected.
  • Canvas – Rendering matches expected browser and OS.

With alignment across all fingerprint dimensions, spoofing becomes incredibly difficult for defenses.

Conclusion

Mimicking organic user agents is crucial when scraping production sites.

With the right tools and diligent architecture, you can spoof user agents effectively across all your browser automation projects.

Let me know if you have any other questions about optimizing user agent rotation and management for your web scrapers!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *