• Home
  • About
    • Jang photo

      Jang

      Jang's blog

    • Learn More
    • Email
    • Facebook
    • Github
  • Posts
    • All Posts
    • All Tags
  • Projects

Web Crawling with temporary IP in selenium (Ubuntu 16.04)

04 Jun 2020

Reading time ~2 minutes

Some sites have a block for specific ip addresses. Here, I am going to detail a process in which we can access these blocked websites through temporary ip from selenium (with the help of Tor).

environment

Just for your information, here are the programs I used for my test environment.

  • Ubuntu 16.04
  • selenium 3.8.0
  • python 3.6.0
  • firefox 57.0.1
  • geckodriver 0.19.1

Crawling with temporary IP address

FIRST STEP: Install Tor

First, Let’s install the Tor browser.

sudo apt-get update
sudo apt-get install tor
/etc/init.d/tor restart

netstat -ntlp, With that command, you can see that the listener created a port at 9050 as you see down below.

$ netstat -ntlp
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -
tcp        0      0 127.0.0.1:9050          0.0.0.0:*               LISTEN      -
tcp6       0      0 :::22                   :::*                    LISTEN      -

SECOND STEP: Forward IP address with WebDriver

Let’s check a current IP address.

from selenium import webdriver

driver = webdriver.Firefox()

driver.get('http://icanhazip.com/')

print(driver.page_source)

driver.quit()

icanhazip.com is a simple site that I can use to print my current ip address. This code use a page_source attribute from WebDriver.

We can see the result below. The IP address in this case starts at 13.125.

<html><head><link rel="alternate stylesheet" type="text/css" href="resource://content-accessible/plaintext.css" title="Wrap Long Lines"></head><body><pre>
13.125.XX.XXX
</pre></body></html>

Now, we are going to forward the IP using the 9050 port that is from the Tor installation process.

from selenium import webdriver


profile = webdriver.FirefoxProfile()
profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.socks", "127.0.0.1")
profile.set_preference("network.proxy.socks_port", 9050)

profile.update_preferences()

driver = webdriver.Firefox(profile)

driver.get('http://icanhazip.com/')

print(driver.page_source)

driver.quit()

The result from this execution is as follows:

. . .  
<span class="cf-footer-item"><span data-translate="your_ip">Your IP</span>:
77.247.181.162
</span>
. . .  

My IP address was changed to 77.247.181.162!

Forward IP address with RemoteWebDriver

With RemoteWebDriver, You can use WebDriver both remotely and locally in a similar fashion.
The primary difference is that a remote WebDriver needs to be configured so that it can run your tests on a separate machine.

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

. . .
# Other parts of code are completely same with ip forwarding in WebDriver
driver = webdriver.Remote('http://127.0.0.1:4444/wd/hub', DesiredCapabilities.FIREFOX, browser_profile=profile)

. . .

The difference between the two is the use of a remote WebDriver and its parameters.



selenium Share Tweet +1