[1]:
import transportation_tutorials as tt
import pandas as pd
import numpy as np
import statsmodels.api as sm
Construct an ordinary least squares linear regression model to predict the given value of time for each individual in the Jupiter study area data as a function of: - age, - gender, - full-time employment status, and - household income.
Evaluate this model to answer the questions:
To answer the questions, use the following data files:
[2]:
per = pd.read_csv(tt.data('SERPM8-BASE2015-PERSONS'))
hh = pd.read_csv(tt.data('SERPM8-BASE2015-HOUSEHOLDS'))
[3]:
per.head()
[3]:
hh_id | person_id | person_num | age | gender | type | value_of_time | activity_pattern | imf_choice | inmf_choice | fp_choice | reimb_pct | wrkr_type | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1690841 | 4502948 | 1 | 46 | m | Full-time worker | 5.072472 | M | 1 | 1 | -1 | 0.0 | 0 |
1 | 1690841 | 4502949 | 2 | 47 | f | Part-time worker | 5.072472 | M | 2 | 37 | -1 | 0.0 | 0 |
2 | 1690841 | 4502950 | 3 | 11 | f | Student of non-driving age | 3.381665 | M | 3 | 1 | -1 | 0.0 | 0 |
3 | 1690841 | 4502951 | 4 | 8 | m | Student of non-driving age | 3.381665 | M | 3 | 1 | -1 | 0.0 | 0 |
4 | 1690961 | 4503286 | 1 | 52 | m | Part-time worker | 2.447870 | M | 1 | 2 | -1 | 0.0 | 0 |
[4]:
hh.head()
[4]:
Unnamed: 0 | hh_id | home_mgra | income | autos | transponder | cdap_pattern | jtf_choice | autotech | tncmemb | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 426629 | 1690841 | 7736 | 512000 | 2 | 1 | MMMM0 | 0 | 0 | 0 |
1 | 426630 | 1690961 | 7736 | 27500 | 1 | 0 | MNMM0 | 0 | 0 | 0 |
2 | 426631 | 1690866 | 7736 | 150000 | 2 | 0 | HMM0 | 0 | 0 | 0 |
3 | 426632 | 1690895 | 7736 | 104000 | 2 | 1 | MMMM0 | 0 | 0 | 0 |
4 | 426633 | 1690933 | 7736 | 95000 | 2 | 1 | MNM0 | 0 | 0 | 0 |