1

I am developping a python program that uses selenium (webdriver python bindings) and PhantomJS (headless WebKit scriptable with a JavaScript API) to load and interact with websites.

When I use this program on a local ubuntu computer/network it loads the websites correctly ; I can dump all their the HTML :

print webdriver.page_source

When I run it on the server, this line only prints

<html><head></head><body></body></html>

It looks like the server answered the request with an empty HTML page.

This issue happens on 2 websites, but the program works correctly for the third website. This makes me think that it is a networking issue more than a programming issue (?). The server is provided by a vps provider.

From the server, I can ping the server of one of the website that answers empty HTML which makes me think that I am not ip blacklisted or banned.

Here is netstat -tulpen output (ran on server) :

tcp 0 0 0.0.0.0:41207 0.0.0.0:* LISTEN 0 267296 22458/phantomjs
tcp 0 0 0.0.0.0:38457 0.0.0.0:* LISTEN 0 267294 22463/phantomjs
tcp 0 0 0.0.0.0:33667 0.0.0.0:* LISTEN 0 267295 22461/phantomjs

I don't know how to debug this / understand what is happening.

Update : After some testing, I made a JS script that directly uses PhantomJS to dump the HTML content of a page and log errors.

It gives

FAIL to load the address Error creating SSL context (error:140A90C4:SSL routines:func(169):reason(196))

So it could be related to PhantomJS or something that blocks it.

NanoPish
  • 63
  • 6

1 Answers1

0

After determining that the bug seems to come from PhantomJS, I played with its options and parameters.

It seems like the version I ran on the server (1.9.8) is broken for some of the websites I need to interact with.

I installed 2.1.1 (the version that I ran on client) on the server, and it now works well.

NanoPish
  • 63
  • 6