[1]:
import os
import numpy as np
import pandas as pd
import transportation_tutorials as tt
A variety of things can go wring when you run a piece of Python code: input files might be missing or corrupt, there might be typos or bugs in code you wrote or code provided by others, etc.
When Python encounters a problem it does not know how to manage on its own, it generally raises an exception. Exceptions can be basic errors or more complicated problems, and the message that comes along with an exception usually has a bunch of information that comes with it. For example, consider this error:
[2]:
for i in 1 to 5:
print(i)
File "<ipython-input-2-7864abb46b1a>", line 1
for i in 1 to 5:
^
SyntaxError: invalid syntax
The SyntaxError
tells you that the indicated bit of code isn’t valid for Python, and simply cannot be run. It helpfully also adds a carat marker pointing to the exact place where the problem was found. In this case, the problem is the “to” in the “for” loop, which is found in many other languages, but not in Python.
Obviously, even if the code is readable as valid Python code, there still may be errors.
[3]:
speeds = {
'rural highway': 70,
'urban highway': 55,
'residential': 30,
}
for i in speed_limits:
print(speed_limits[i])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-3-a3dd58a8997a> in <module>
5 }
6
----> 7 for i in speed_limits:
8 print(speed_limits[i])
NameError: name 'speed_limits' is not defined
Here, the code itself is valid, but a NameError
occurs because there is an attempt to use a variable name that has not been defined previously. The error message itself is pretty self-explanatory. But consider this:
[4]:
road_types = ['rural highway', 'urban highway' 'residential']
[5]:
for i in road_types:
print(speeds[i])
70
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-5-e2c143b87c0c> in <module>
1 for i in road_types:
----> 2 print(speeds[i])
KeyError: 'urban highwayresidential'
A KeyError
occurs when using a key to get a value from a mapping (i.e., a dictionary or a similar object), but the key cannot be found. Usually, the misbehaving key is also shown in the error message, as in this case, although the value of the key may be unexpected. Here, it appears to be the last two keys of the list mashed together. This happened due to a missing comma in the definition of the list earlier. When that line with the missing comma was read, it was interpreted as a valid
Python instruction: a list with two items, the second item being two string value seperated only by whitespace, which implies they are to be concatenated. It is only when this value is ultimately used in the look that it Python discovers there is anything wrong.
To demonstrate a more complicated example, we can attempt to read a file that does not exists, which will raise an exception like this:
[6]:
pd.read_csv('path/to/non-existant/file.csv')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-6-98079078e677> in <module>
----> 1 pd.read_csv('path/to/non-existant/file.csv')
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
700 skip_blank_lines=skip_blank_lines)
701
--> 702 return _read(filepath_or_buffer, kwds)
703
704 parser_f.__name__ = name
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
427
428 # Create the parser.
--> 429 parser = TextFileReader(filepath_or_buffer, **kwds)
430
431 if chunksize or iterator:
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
893 self.options['has_index_names'] = kwds['has_index_names']
894
--> 895 self._make_engine(self.engine)
896
897 def close(self):
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
1120 def _make_engine(self, engine='c'):
1121 if engine == 'c':
-> 1122 self._engine = CParserWrapper(self.f, **self.options)
1123 else:
1124 if engine == 'python':
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1851 kwds['usecols'] = self.usecols
1852
-> 1853 self._reader = parsers.TextReader(src, **kwds)
1854 self.unnamed_cols = self._reader.unnamed_cols
1855
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
FileNotFoundError: [Errno 2] File b'path/to/non-existant/file.csv' does not exist: b'path/to/non-existant/file.csv'
There’s a lot of output here, but the last line of the output is pretty clear by itself: the file does not exist. As a general rule of thumb, when something you are running raises an exception, the message printed at the very bottom of the error output is the first place to look to try to find an explanation for what happened and how to fix it.
Sometimes, however, the explanation for the error is not quite a self-explanatory as the FileNotFoundError
.
[7]:
tt.problematic()
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()
pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 15: invalid start byte
During handling of the above exception, another exception occurred:
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-7-1b294bfd1ac2> in <module>
----> 1 tt.problematic()
~/Git/python-for-transport/code/transportation_tutorials/data/__init__.py in problematic()
46 # When there are various lines of code intervening,
47 # you might not get to see the relevant problem in the traceback
---> 48 result = pandas.read_csv(filename)
49 return result
50
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
700 skip_blank_lines=skip_blank_lines)
701
--> 702 return _read(filepath_or_buffer, kwds)
703
704 parser_f.__name__ = name
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
433
434 try:
--> 435 data = parser.read(nrows)
436 finally:
437 parser.close()
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
1137 def read(self, nrows=None):
1138 nrows = _validate_integer('nrows', nrows)
-> 1139 ret = self._engine.read(nrows)
1140
1141 # May alter columns / col_dict
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
1993 def read(self, nrows=None):
1994 try:
-> 1995 data = self._reader.read(nrows)
1996 except StopIteration:
1997 if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()
pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 15: invalid start byte
In this case, the error report is less clear. The error type being raised is a UnicodeDecodeError
, which gives us a hint of the problem: some kind of unicode text data is attempting (unsuccessfully) to be read from somewhere. But if you don’t know exactly what the problematic
function is supposed to do, it might not be obvious what it wrong. It is in this situation that all the other data printed along with the error can be valuable. This other stuff is called a “traceback”, because it
provides the entire path through the code, from the problematic
function call, through every sub-function called, to the point where the error is encountered. Every function call is shown with both the name of the file and the name of the function.
For the most part, errors are unlikely to arise from bugs in major software packages, such as numpy
and pandas
. These packages are rigorously tested, and while it is possible to find a bug, it is generally unusual – it is much much more likely that bugs or errors will arise from application-specific code. Thus, it can be helpful to scan through all of the various files and functions, and look for items that are related to application-specific files. In this case, we skip over all the
lines referencing pandas
files, and focus on the other lines, which are found in the transportation_tutorials
package:
.../transportation_tutorials/data/__init__.py in problematic()
46 # When there are various lines of code intervening,
47 # you might not get to see the relevant problem in the traceback
---> 48 result = pandas.read_csv(filename)
49 return result
50
By default in a Jupyter notebook, when the source code is written in Python, the traceback print out includes the offending line of code plus two lines before and after, to give some context. Sometimes that little snippet is enough to reveal the problem itself, but in this case those lines include some comments, which don’t really help us solve the problem.
If you want to investigate further, you can open the filename shown in a text editor such as Notepad++, and scroll to the indicated line number. In this file, if we did that we would see some more context that should help diagnose this problem:
.../transportation_tutorials/data/__init__.py in problematic()
42
43 def problematic():
44 filename = data('THIS-FILE-IS-CORRUPT')
45 import pandas
46 # When there are various lines of code intervening,
47 # you might not get to see the relevant problem in the traceback
---> 48 result = pandas.read_csv(filename)
49 return result
Well, that’s helpful… it turns out we are loading a file that is intentionally corrupt, with junk data in part of the file, as might happen on a botched download from a remote server. If only diagnosing all errors were so easy! Unfortunately (or, fortunately, depending on your perspective), in real world applications, code probably won’t attempt to load a file that is intentionally corrupt and so clearly labelled as such.
If you are unable to diagnose or solve a problem yourself, it may make sense to enlist some help from a co-worker or outside professional. When doing so, it is usually valuable not only to report what you were trying to do when a problem occurred, but also to send the entire traceback output from the problem as well. This offers others the chance to follow along through the code, and often problems can be diagnosed easily by looking at the complete traceback, particularly if they also have access to the same source code.
For more complicated problems, it may also be beneficial to share additional system information. This is particularly common and generally expected when you report issues with major packages such as numpy or pandas, but it can be useful for debugging other more localized problems as well. You can access some basic information about your system and your Anaconda Python installation by using the conda info
command in a console or with the Anaconda Prompt on Windows.
(tt) C:\Users\cfinley>conda info
active environment : tt
active env location : C:\Users\cfinley\AppData\Local\Continuum\anaconda3\envs\tt
shell level : 2
user config file : C:\Users\cfinley\.condarc
populated config files : C:\Users\cfinley\AppData\Local\Continuum\anaconda3\.condarc
C:\Users\cfinley\.condarc
conda version : 4.6.9
conda-build version : 3.12.0
python version : 3.6.5.final.0
base environment : C:\Users\cfinley\AppData\Local\Continuum\anaconda3 (writable)
channel URLs : https://repo.anaconda.com/pkgs/main/win-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/free/win-64
https://repo.anaconda.com/pkgs/free/noarch
https://repo.anaconda.com/pkgs/r/win-64
https://repo.anaconda.com/pkgs/r/noarch
https://repo.anaconda.com/pkgs/msys2/win-64
https://repo.anaconda.com/pkgs/msys2/noarch
https://conda.anaconda.org/conda-forge/win-64
https://conda.anaconda.org/conda-forge/noarch
https://conda.anaconda.org/jpn/win-64
https://conda.anaconda.org/jpn/noarch
package cache : C:\Users\cfinley\AppData\Local\Continuum\anaconda3\pkgs
C:\Users\cfinley\.conda\pkgs
C:\Users\cfinley\AppData\Local\conda\conda\pkgs
envs directories : C:\Users\cfinley\AppData\Local\Continuum\anaconda3\envs
C:\Users\cfinley\.conda\envs
C:\Users\cfinley\AppData\Local\conda\conda\envs
platform : win-64
user-agent : conda/4.6.9 requests/2.18.4 CPython/3.6.5 Windows/10 Windows/10.0.17763
administrator : False
netrc file : None
offline mode : False
In simple code or analysis projects, most of the time you’ll just want to avoid having errors in your Python code. However, if you are writing Python functions that are shared with others or will be re-used in multiple places, it may be desirable or necessary to handle errors, instead of just avoiding them. To do so, you can use a try...except
statement.
[10]:
try:
table = pd.read_csv('path/to/non-existant/file.csv')
except:
table = pd.DataFrame() # set to blank dataframe
print(table)
Empty DataFrame
Columns: []
Index: []
The try...except
works like this: first, the code in the try
block is run. If an exception is raised while running this code, execution immediately jumps to the start of the except
block and continues. If no errors are raised, the code in the except
block is ignored.
As shown above, this code will set the table
variable to a blank dataframe for any kind of error. It is also possible (and often preferable) to be more discriminating in error processing, only catching certain types of errors. For example, we may only want to recover like this when the file is missing; if it is corrupt or something else is wrong, we want to know about it. In that case, we can catch only FileNotFoundError
, which will work as desired for the missing file:
[11]:
try:
table = pd.read_csv('path/to/non-existant/file.csv')
except FileNotFoundError:
table = pd.DataFrame() # set to blank dataframe
print(table)
Empty DataFrame
Columns: []
Index: []
And raise the error for the corrupt file:
[12]:
try:
table = pd.read_csv(tt.data('THIS-FILE-IS-CORRUPT'))
except FileNotFoundError:
table = pd.DataFrame() # set to blank dataframe
print(table)
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()
pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 15: invalid start byte
During handling of the above exception, another exception occurred:
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-12-c39965b6a338> in <module>
1 try:
----> 2 table = pd.read_csv(tt.data('THIS-FILE-IS-CORRUPT'))
3 except FileNotFoundError:
4 table = pd.DataFrame() # set to blank dataframe
5 print(table)
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
700 skip_blank_lines=skip_blank_lines)
701
--> 702 return _read(filepath_or_buffer, kwds)
703
704 parser_f.__name__ = name
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
433
434 try:
--> 435 data = parser.read(nrows)
436 finally:
437 parser.close()
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
1137 def read(self, nrows=None):
1138 nrows = _validate_integer('nrows', nrows)
-> 1139 ret = self._engine.read(nrows)
1140
1141 # May alter columns / col_dict
~/anaconda/envs/tt/lib/python3.7/site-packages/pandas/io/parsers.py in read(self, nrows)
1993 def read(self, nrows=None):
1994 try:
-> 1995 data = self._reader.read(nrows)
1996 except StopIteration:
1997 if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._string_convert()
pandas/_libs/parsers.pyx in pandas._libs.parsers._string_box_utf8()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 15: invalid start byte
Alternatively, we can write different error handlers for the different kind of errors we expect to encounter:
[16]:
try:
table = pd.read_csv(tt.data('THIS-FILE-IS-CORRUPT'))
except FileNotFoundError:
table = pd.DataFrame() # set to blank dataframe
except UnicodeDecodeError:
table = pd.DataFrame(['corrupt!'], columns=['data'])
print(table)
data
0 corrupt!
There are a variety of other advanced techniques for error handling described in the official Python tutorial on this topic.
[ ]: