Bots are useful, even in your personal life.

Bots are everywhere and they aren’t such a bad thing. Sure people write them to be spammy and clog up parts of the web. Webmasters tend to hate bots as spammers use them to invade their communities and wreck havoc at times.

Bots are great though, they are the backbone of companies like Google, they are what allow Google to update the information they have on websites around the web so they can then determine relevant information to feed to your browser when you do a search.

One example of a bot I wrote was to notify me every time a new list of books came out on a specific website. Granted this site had copyright infringed books but I still thought it was a neat thing to try out. I should disclaim that I am a member of Safaribooksonline.com and essentially yields this experiment for just entertainment value.

The post emailer bot code for book notifier for the now defunct it-ebooks.info below.

#!/usr/bin/python

from sqlalchemy.ext.automap import automap_base
from sqlalchemy.orm import Session
from sqlalchemy import create_engine
from sqlalchemy import MetaData
from sqlalchemy import Table
from sqlalchemy import select
from sqlalchemy import Table, Column, Integer, Numeric, String, ForeignKey, text
from sqlalchemy import PrimaryKeyConstraint, UniqueConstraint, CheckConstraint
from sqlalchemy import insert
from sqlalchemy import engine
from sqlalchemy.ext.declarative import declarative_base
Base = automap_base()
from datetime import datetime
from sqlalchemy import DateTime

import hashlib
import urllib
import urllib2
import os
import re
import datetime
import calendar
import time

from bs4 import BeautifulSoup

import smtplib
import pymysql

response = urllib.urlopen("http://it-ebooks.info/")

m = hashlib.md5()

soup = BeautifulSoup(response.read())
tag = soup.find_all("td", attrs={"class": "top"})
print str(tag)
m.update(str(tag))

engine = create_engine('mysql+pymysql://user:password@localhost/database_name')

check_duplicate = engine.execute("SELECT MAX(id) FROM %s " % 'itebooks')

for i in check_duplicate:
print i[0]
row = engine.execute("SELECT * FROM {} WHERE id = {}".format('itebooks', str(i[0])))

for i in row:
print i[1]
if m.hexdigest() != i[1]:
print "time to update"
engine.execute("INSERT INTO {} ({}) VALUES ('{}')".format('itebooks', 'checker', m.hexdigest()))
msg = 'it-ebooks has updated.'

server = smtplib.SMTP('smtp.gmail.com',587) #port 465 or 587
server.ehlo()
server.starttls()
server.ehlo()
server.login('from@gmail.com','password')
server.sendmail('from@gmail.com','to@gmail.com',msg)
server.close()

else:
print "no need to update"

This script ran via a cron job which I would set to run every 8 hours. The logic was simple, the home page would be botted, the file would then be parsed at a certain level and then fed into a hashing algorithm. The hashing algorithm would then run and compare this to the previous hash in the database. If the hash was the same it meant no change, had it changed it would send out an email to a specific address and would notify me that new books were on the site. Kinda fun actually. (:

Bots are simply programs that can run in the background all the time and go out and either put or retrieve information and then do something with it. In this case I built a bot to check if parts of a web page had changed and I then used that info to check a db and if there was a change I would get an email.

Bots that are engineered carefully are great for breaking problems down into mere seconds or minutes that would take a human days or even years to do. Great for automating work and things like dating (-;

2 thoughts on “Bots are useful, even in your personal life.”

    1. I think they were taken down and have been for the past month or so. I’ll admit I enjoyed the books they provided but they were outright copyright infringing and doing so in a way they couldn’t place blame on a 3rd party, so, shutdown.

Leave a Reply

Your email address will not be published. Required fields are marked *