Cron job to run python script raises error

1

I'm trying to write a cron job to periodically run a python script I've written, which will add some data to a database I'm building. The script works, and it works when I run python /Users/me/Desktop/pythonScript/script.py from the terminal, but the cron job is not working. I've run, chmod a+x /Users/me/Desktop/pythonScript/script.py to make the script executable. The python script also begins with #!/usr/bin/python.

I've added the result of $PATH as the PATH variable in my crontab, as advised here, as well as adding SHELL and HOME variables.

crontab -l presently returns this:

PATH="/Library/Frameworks/Python.framework/Versions/3.6/bin:/Users/cole/anaconda/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Library/TeX/texbin"

SHELL="/bin/bash"

HOME = "/Users/me/Desktop/pythonScript/"

* * * * * python script.py
* * * * * env > /tmp/cronenv

The first job is suposed to run my script script.py while the second prints the cron environment to the file tmp/cronenv. This file looks like this:

SHELL=/bin/bash
USER=me

PATH=/Library/Frameworks/Python.framework/Versions/3.6/bin:/Users/cole/anaconda/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Library/TeX/texbin
PWD=/Users/cole
SHLVL=1
HOME=/Users/cole
LOGNAME=cole
_=/usr/bin/env

However, my database is not updating and when I search for cron in my system.log file, I find the following error messages:

Nov  5 20:24:00 Coles-MacBook-Air-2 cron[3301]: no path for address 0x11a77b000
Nov  5 20:24:00 Coles-MacBook-Air-2 cron[3302]: no path for address 0x11a77b000
Nov  5 20:25:00 Coles-MacBook-Air-2 cron[3314]: no path for address 0x11a77b000
Nov  5 20:25:00 Coles-MacBook-Air-2 cron[3315]: no path for address 0x11a77b000

Note there are two for each minute, one for each cronjob, though the second one appears to be working while the first is not. Any suggestions?

As it may be relevant, this is the script:

script.py

#!/usr/bin/python
# -*- coding: utf-8 -*-

import requests
import re
from nltk import word_tokenize
import time
import pickle

saveDir = '/Users/me/Desktop/pythonScript/dbfolder' #the folder where I want to save files 
workingDir = '/Users/me/Desktop/pythonScript/' #location of the script

#this function turns integer values into their url location at a gutenberg mirror
home = 'http://mirror.csclub.uwaterloo.ca/gutenberg/'
fileType = '.txt'
def urlMaker(x):
    url = home
    if int(x) > 10:
        for j in [i for i in range(len(x)-1)]:
            url += x[j]+'/'
       url += x+'/'+x+fileType
else:
    url = home+'0/'+x+'/'+x+fileType
return(url)

#this function takes a url and returns the .txt files at each url, as w as a list of cleaned paragraphs over 100 words in length.
def process(url):
    try:
        r  = requests.get(url)
    except ConnectionError:
        time.sleep(300)
        try:
            r  = requests.get(url)
        except ConnectionError:
            time.sleep(600)
            try:
                r  = requests.get(url)
            except ConnectionError:
                return(ConnectionError) 
    toprint = r.text
    text = r.text.lower()
    k = re.search('\Send\Sthe small print!',text)
    l = re.search('the project gutenberg etext of the declaration of independence',text)
    m = re.search('start of (.*) project gutenberg (.*)*', text)
    n = re.search('end of (.*) project gutenberg (.*)*', text)
    o = re.search('http://gutenberg.net.au/licence.html', text)
    p = re.search('this site is full of free ebooks', text)
    x = 0
    lst = []
    if m and n:
        start,end = re.escape(m.group(0)), re.escape(n.group(0))
        text = re.search('{}(.*){}'.format(start, end), text, re.S).group(1)
    elif o and p:
        start,end = re.escape(o.group(0)), re.escape(p.group(0))
        text = re.search('{}(.*){}'.format(start, end), text, re.S).group(1)
    elif l and n:
        start,end = re.escape(l.group(0)), re.escape(n.group(0))
        text = re.search('{}(.*){}'.format(start, end), text, re.S).group(1)
    elif k and n:
        start,end = re.escape(k.group(0)), re.escape(n.group(0))
        text = re.search('{}(.*){}'.format(start, end), text, re.S).group(1)
    else:
        text = text
    if text.split('\n\n') != [text]:
        for i in text.split('\n\n'):
            if i != ''\
            and 'gutenberg' not in i\
            and 'ebook' not in i\
            and 'etext' not in i\
            and len(word_tokenize(i)) > 100:
                lst += [i.replace('\n',' ')]
        x = 1
    if text.split('\r\n\r\n') != [text] and x == 0:
        for i in text.split('\r\n\r\n'):
            if i != ''\
            and 'gutenberg' not in i\
            and 'ebook' not in i\
            and 'etext' not in i\
            and len(word_tokenize(i)) > 100:
                lst += [i.replace('\r\n',' ')]
    return((lst,toprint))

####makes an index dictionary of the titles to the title number
indexUrl = 'http://mirror.csclub.uwaterloo.ca/gutenberg/GUTINDEX.ALL'
r  = requests.get(indexUrl)
index = r.text.lower()
#plits index file by beginning and end
start = re.escape(re.search('~ ~ ~ ~ posting dates for the below ebooks:  1 oct 2017 to 31 oct 2017 ~ ~ ~ ~'\
                            ,index).group(0))
end = re.escape(re.search('<==end of gutindex.all==>',index).group(0))
index = re.search('{}(.*){}'.format(start, end), index, re.S).group(1)

#splits file by pc line breaks
lbPC = re.split('\r\n\r\n',index)

#cleans subtitles from line using PC notation
cleanSubsPC = []
for i in lbPC:
    cleanSubsPC += [i.split('\r\n')[0]]

#splits lines which use MAC notation
lbMAC = []
for i in cleanSubsPC:
    if re.split('\n\n',i) == [i]:
        lbMAC += [i]
    else:
        lbMAC += [x for x in re.split('\n\n',i)]

#cleans subtitles etc. which use MAC linebreaks        
cleanSubsMAC = []
for i in lbMAC:
    cleanSubsMAC += [i.split('\n')[0]]

#builds list of strings containing titles and numbers, cleaned of weird unicode stuff
textPairs = []
for i in cleanSubsMAC:
    if len(i) > 1 and not i =='':
        if not i.startswith('~ ~ ~ ~ posting')\
        and not i.startswith('title and author'):
            try:
                int(i[-1])
                textPairs += [i.replace('â','')\
                     .replace('â\xa0',' ').replace('\xa0',' ')]
            except ValueError:
                pass

#builds dic of key:title pairs
inDic = {}
for i in textPairs:
    inDic[int(re.match('.*?([0-9]+)$', i).group(1))] = i.split('   ')[0].replace(',',' ')

#makes dictionary of urls to access
urls = {}
for x in [x for x in range(1,55863)]:
    urls[x] = urlMaker(str(x))

#this opens a saved dictionary of the collected data, so the script will begin where it left off previously
try:
    with open(workingDir+'gutenburgDic', 'rb') as handle:
        data = pickle.load(handle)
except FileNotFoundError:
    pass

#actually iterates through urls, saving data, 100 texts at a time. Also saves raw text files for later use
for i in range(len(data)+1,len(data)+101):
    data[i],text = (urls[i],process(urls[i])[0]),process(urls[i])[1]
    f = open(saveDir+urls[i].replace('/','.'),'w')
    f.write(text)
    f.close()

#saves updated dictionary of >100 word paragraphs 
with open(workingDir+'gutenburgDic', 'wb') as handle:
    pickle.dump(data, handle, protocol=pickle.HIGHEST_PROTOCOL)

Cole Robertson

Posted 2017-11-05T20:31:52.900

Reputation: 121

in the line with: HOME = "Users/me/Desktop/pythonScript/" there should be a "/" before Users – jet – 2017-11-05T20:46:50.647

@jet sorry, that's a typo. There is in the real file. – Cole Robertson – 2017-11-05T20:48:39.317

also in the line: * * * * * python script.py you have to specify the full path to script.py and the path to python as well – jet – 2017-11-05T20:51:03.213

@jet, per this http://krisjordan.com/essays/timesaving-crontab-tips, isn't the point of setting a HOME variable to avoid the need for a long absolute path to the script? Re: the path to python, I'm sorry, I'm a noob, but do you mean python as installed in my applications folder, .e.g. /Macintosh HD/Applications/Python 3.6?

– Cole Robertson – 2017-11-05T20:57:31.927

@jet or is it rather /usr/local/bin/python3.6? – Cole Robertson – 2017-11-05T21:04:54.267

type: which python and you will get it – jet – 2017-11-05T21:11:50.507

Ok, I've updated the paths as you've suggested. It still does not work. Still spits out same error in the system log. – Cole Robertson – 2017-11-05T21:29:14.997

I think all the no path for address messages come from the python job. Confirm this by running the env job only and checking system.log. Googling this enigmatic no path for address makes me suspect it may appear when cron runs a tool that resolves an URL (and fails?). Note: I cannot resolve http://mirror.csclub.uwaterloo.ca/ (from your cross post), can you? Also: Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? Short answer: no.

– Kamil Maciorowski – 2017-11-06T07:39:35.417

@TheFurryITSnuggleBuddy I have tried all those options, and none of them work and all produce the same error. When I run CAT /Users/me/Desktop/pythonScript/script.py I see my script in terminal as I would expect. – Cole Robertson – 2017-11-06T09:50:42.943

@KamilMaciorowski When I delete the python job and let my cron run only env job, I still see one no path for address error. If by resolve it you mean does it point to a valid website for me, the answer is yes. Re: cross post, thanks for pointing that out. – Cole Robertson – 2017-11-06T09:54:06.973

@KamilMaciorowski I should add, I tested whether the env job was working by changing the name of tmp/cronenv to tmp/cronenv_test and waiting for cron to run at the minute mark. A new tmp/cronenv appears despite the system log still raising the no path for address error. – Cole Robertson – 2017-11-06T10:06:33.543

Hmmm. I have tried this and run * * * * * $PATH, $SHELL,$HOME,$USER > /Users/me/Desktop/testcron, which produces an empty file. When I run echo $PATH, $SHELL,$HOME,$USER > /Users/me/Desktop/testcron no file appears from within cron (though it does when run from terminal). – Cole Robertson – 2017-11-06T14:36:55.427

@TheFurryITSnuggleBuddy, ok, I've done that. It actually yields the same variable results as the env > tmp/cronenv job. Nothing there seems out of place to me... – Cole Robertson – 2017-11-06T18:28:55.410

So pipe the content of the scripts to a temp file for both jobs ensuring the Cron security context can actually read from each script and check the file with both. I'm out of ideas after that but will be super curious of your solution regardless... ping me back when you get to it. – Pimp Juice IT – 2017-11-06T21:34:17.550

Answers

1

I got it to work. I reinstalled python and anaconda, as well as my OS (Sierra). I then followed the suggestion given by @Jet, and updated my crontab to:

PATH="/anaconda3/bin:/Library/Frameworks/Python.framework/Versions/3.6/bin:/Library/Frameworks/Python.framework/Versions/3.6/bin:/Users/cole/anaconda/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/Library/Frameworks/Python.framework/Versions/3.5/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Library/TeX/texbin"

SHELL="/bin/bash"

HOME = "/Users/me/Desktop/pythonScript/"

* * * * * /anaconda3/bin/python /Users/me/Desktop/pythonScript/script.py

Where the path to python is taken from the result of which python in terminal.

Cole Robertson

Posted 2017-11-05T20:31:52.900

Reputation: 121