Rewriting a script to work with a hosting provider

There are all kinds of ways and reasons to rewrite a script or package of scripts you wrote, would you do it for the hosting provider you selected? Would I be right in saying no? Wise thing to do. I spent $42 for a month of hosting with soyoustart.com, a subsidiary of ovh.com.

Caveat: The reason for using a library like dryscrape is I needed to render Javascript before scraping as the webpage needed an interpreter to render the html, which I could then scrape. Otherwise the argument to just make a simple bot with standard Python libraries is valid.

I recently wrote a scraper script that utilized a python library called dryscrape. The script was ready to go, I had tested it locally, on a Raspberry Pi, and on a Digital Ocean instance.
I got a server with soyoustart because I wanted 2 TB of disk space as I was scraping a lot of digital media. Everything was easy enough with getting the environment up and running but upon trying to get dryscrape up I got:

g++: internal compiler error: Segmentation fault (program cc1plus)

Please submit a full bug report,

with preprocessed source if appropriate.

See  for instructions.

Makefile.webkit_server:1006: recipe for target 'build/Version.o' failed

make[1]: *** [build/Version.o] Error 4

make[1]: Leaving directory '/tmp/pip-build-jI5qGh/webkit-server/src'

Makefile:38: recipe for target 'sub-src-webkit_server-pro-make_first-ordered' failed

make: *** [sub-src-webkit_server-pro-make_first-ordered] Error 2

error: [Errno 2] No such file or directory: 'src/webkit_server'

My solution, try different distros of linux, different versions of different distros, none of that worked. Next I tried every alternative to installing dry scrape that was offered, including brew (which normally runs on mac) to be used on linux. Again and again error after compile error, all the same thing. I tried different versions of gcc/gcc+, using different source.list to see if that would help. Argh!

I next tried Selenium and faking FireFox Ice Weasel into thinking it was running a window, but still had issues.

Finally I decided to use an implantation of PhantomJS with Python, I ended up using a new Python environment, Anaconda. After running the installation script I found a way to get PhantomJS working with it. All I needed to do was rewrite some things to work with PhantomJS.
While proxy support is not as robust, everything has worked well.

Summary: I learned a lot of course (-: Never give up!
Actually these exercises, in my opinion are never a waste. I learned so much in the process of trying all the different alternatives. This is just a compile error that I think resorts back to OVH’s (soyoustart.com’s) implementation of the Operating System that they have you install on the server you order. This is a rare occurrence and overall I think OVH is hell of a bang for your buck in hosting.

Leave a Reply

Your email address will not be published. Required fields are marked *