cURL is a powerful command-line tool for transferring data using various protocols. One of the most common uses of cURL is for web scraping and accessing web APIs. However, websites often block scrapers and limit API access through IP-based restrictions.
Using a proxy server with cURL allows you to route your requests through an intermediary server, making it harder for websites to identify and block your traffic. Proxies are essential for successful and sustainable web scraping.
In this comprehensive guide, you'll learn:
- What is a proxy server and how it works with cURL
- Steps to set up a cURL proxy on Windows, Linux, and Mac
- Authentication with username and password
- Rotating proxies to avoid blocks
- Extracting data from responses
- Best practices for using proxies
So let's get started!
What Is a Proxy Server?
A proxy server acts as an intermediary between your machine and the destination server you want to access. When you use a proxy, your requests first go through the proxy server, which then forwards them to the target website. The response from the website is sent back to the proxy server first before you receive it.
This provides an extra layer of separation between your IP address and the website's server. The website only sees the IP of the proxy server, not your actual public IP address.
Proxies are commonly used to:
- Access geographically restricted content
- Bypass firewalls and internet filters
- Improve page load speeds
- Scrape data without getting blocked
There are many types of proxy servers available, including public proxies, private proxies, rotating proxies, and more.
How to Use a Proxy Server with cURL
Let's look at how to set up a proxy server with cURL to route your requests and bypass blocks.
cURL Proxy Command Syntax
cURL has a simple structure to use a proxy server:
curl --proxy [PROTO]://[HOST]:[PORT] [URL]
- PROTO: Protocol –
- HOST: Proxy server hostname or IP address
- PORT: Proxy port number
- URL: The target URL to access through the proxy
curl --proxy http://192.168.0.1:8080 https://example.com
This command will use the proxy server at IP
192.168.0.1 on port
8080 to send the request to
Set Up a cURL Proxy Server
Follow these steps to use a proxy server with cURL:
- Find a proxy server – You can use free public proxies or paid proxies designed for web scraping. Check the proxy's IP address, port, and protocol.
- Replace the proxy details in the cURL command:
curl --proxy http://[IP]:[PORT] https://example.com
- Run the command in a Terminal or Command Prompt window.
For example, to use the proxy
184.108.40.206 on port
curl --proxy "http://220.127.116.11:8118" "https://httpbin.org/ip"
The response will contain the proxy server's IP rather than your own public IP.
Proxy Authentication with Username and Password
Some proxies require authentication to access them. cURL supports passing a username and password to connect to authenticated proxies.
--proxy-user option to provide the username and password:
curl --proxy http://proxy-url.com:8080 --proxy-user username:password https://target-url.com
You can also include the credentials directly in the proxy URL:
curl --proxy http://username:[email protected]:8080 https://target-url.com
Additionally, you may need to send an authentication header with the request using
curl --proxy http://proxy-url.com:8080 --proxy-user username:password --proxy-header "Proxy-Authorization: Basic encoded" https://target-url.com
encoded is the base64 encoded string of
Extract Data from cURL Proxy Responses
When accessing web pages through a proxy with cURL, you'll often want to extract information from the returned HTML, JSON, or other formatted data.
jq command-line tool can help parse and filter JSON data returned by APIs. For example:
curl -x http://192.168.0.1:8080 https://api.coindesk.com/v1/bpi/currentprice.json | jq .bpi.USD.rate
This sends a request via the proxy, then uses
jq to extract just the current BTC price value.
For HTML content, you can use grep, sed, or other text processing tools to extract data from the cURL response.
Rotate Proxies with cURL
Rotating proxies helps distribute requests across multiple IP addresses. This prevents your traffic from looking like it's coming from a single source.
To use rotating proxies with cURL:
- Get a list of proxies from a provider like Luminati or Oxylabs.
- Save the proxies into a text file
proxies.txt, with one proxy per line as
- Use a command like this to pick a random proxy from the file each time:
PROXY=$(sed -n "$((RANDOM%$(wc -l < proxies.txt)+1))p" proxies.txt) curl --proxy $PROXY https://example.com