Using Groupby and Transform

[1]:
import transportation_tutorials as tt
import pandas as pd
import numpy as np

Questions

  1. Within each FDOT District, what is the fraction of structurally deficient bridge deck area in each County?
  2. Which county has the highest share of structurally deficient bridge deck area within its FDOT District? (Hint: the correct answer is PALM BEACH.)

Data

To answer the question, use the following data files:

[2]:
districts = pd.read_csv(tt.data('FL-COUNTY-BY-DISTRICT'))
districts.head()
[2]:
County District
0 Charlotte 1
1 Collier 1
2 DeSoto 1
3 Glades 1
4 Hardee 1
[3]:
bridges = pd.read_csv(tt.data('FL-BRIDGES'))

# Recall the necessary cleaning for the bridges data file
bridges = bridges.replace('-', 0)
bridges[['Poor #', 'SD #']] = bridges[['Poor #', 'SD #']].astype(int)
bridges.fillna(0, inplace=True)

bridges.head()
[3]:
County Total # Good # Fair # Poor # SD # Total Area Good Area Fair Area Poor Area SD Area
0 ALACHUA (001) 111 64 47 0 0 64767 55794 8973 0.0 0.0
1 BAKER (003) 89 30 52 7 8 32162 19369 12282 510.0 623.0
2 BAY (005) 122 49 63 10 11 210039 98834 109628 1577.0 10120.0
3 BRADFORD (007) 62 23 37 2 2 9330 5492 3217 620.0 620.0
4 BREVARD (009) 241 160 81 0 0 364138 204179 159959 0.0 0.0

Solution

We need to create a table that answers (1). The first step is to attach the district number for each row (County) in the bridges table. The County field in the bridges table is not just the county name in upper case, but also a three digit code number, but the districts file only has plain names in title case. We’ll need to strip the code numbers from bridges (six characters, including the 3 digits, two parentheses, and a space), and convert the names in districts to all upper case, so there is an exact match for merging.

[4]:
bridges['County'] = bridges['County'].str[:-6]
districts['County'] = districts['County'].str.upper()
[5]:
bridges_2 = pd.merge(
    bridges,
    districts[['County','District']],
    on='County',
)
[6]:
bridges_2.head()
[6]:
County Total # Good # Fair # Poor # SD # Total Area Good Area Fair Area Poor Area SD Area District
0 ALACHUA 111 64 47 0 0 64767 55794 8973 0.0 0.0 2
1 BAKER 89 30 52 7 8 32162 19369 12282 510.0 623.0 2
2 BAY 122 49 63 10 11 210039 98834 109628 1577.0 10120.0 3
3 BRADFORD 62 23 37 2 2 9330 5492 3217 620.0 620.0 2
4 BREVARD 241 160 81 0 0 364138 204179 159959 0.0 0.0 5

The we need to use transform to compute the share of total ‘SD Area’ across the district in each county.

[7]:
bridges_2['SD Area Share in District'] = bridges_2.groupby('District')[['SD Area']].transform(lambda x: (x / x.sum()))
[8]:
bridges_2
[8]:
County Total # Good # Fair # Poor # SD # Total Area Good Area Fair Area Poor Area SD Area District SD Area Share in District
0 ALACHUA 111 64 47 0 0 64767 55794 8973 0.0 0.0 2 0.000000
1 BAKER 89 30 52 7 8 32162 19369 12282 510.0 623.0 2 0.010061
2 BAY 122 49 63 10 11 210039 98834 109628 1577.0 10120.0 3 0.071854
3 BRADFORD 62 23 37 2 2 9330 5492 3217 620.0 620.0 2 0.010012
4 BREVARD 241 160 81 0 0 364138 204179 159959 0.0 0.0 5 0.000000
5 BROWARD 689 535 150 4 7 1192081 952849 238309 923.0 1372.0 4 0.036054
6 CALHOUN 49 19 29 1 1 76300 50437 25863 0.0 0.0 3 0.000000
7 CHARLOTTE 207 172 35 0 1 250385 229102 21284 0.0 1511.0 1 0.105179
8 CITRUS 41 32 9 0 0 21903 19948 1955 0.0 0.0 7 0.000000
9 CLAY 77 29 45 3 3 68282 19034 48860 388.0 388.0 2 0.006266
10 COLLIER 356 281 73 2 2 190375 158006 32207 162.0 162.0 1 0.011277
11 COLUMBIA 113 60 49 4 4 51928 32910 18304 715.0 715.0 2 0.011546
12 DIXIE 55 39 11 5 5 19017 10921 7033 1063.0 1063.0 2 0.017166
13 DUVAL 763 479 262 22 22 1795247 932820 811836 50590.0 50590.0 2 0.816969
14 ESCAMBIA 243 116 111 16 18 585435 259541 233423 92471.0 92585.0 3 0.657368
15 FLAGLER 96 72 23 1 1 70184 65303 4881 0.0 0.0 5 0.000000
16 FRANKLIN 52 6 44 2 2 211234 979 209810 445.0 445.0 3 0.003160
17 GADSDEN 120 63 49 8 8 138044 57793 77427 2824.0 2824.0 3 0.020051
18 GILCHRIST 11 3 8 0 0 4052 670 3382 0.0 0.0 2 0.000000
19 GLADES 59 43 16 0 0 42294 35081 7213 0.0 0.0 1 0.000000
20 GULF 62 18 38 6 6 44915 35451 5690 3774.0 3774.0 3 0.026796
21 HAMILTON 61 30 31 0 0 34327 21766 12560 0.0 0.0 2 0.000000
22 HARDEE 91 64 26 1 1 26997 20293 6548 155.0 155.0 1 0.010789
23 HENDRY 81 63 18 0 0 18807 14827 3980 0.0 0.0 1 0.000000
24 HERNANDO 58 53 5 0 0 29667 27895 1772 0.0 0.0 7 0.000000
25 HIGHLANDS 63 42 20 1 2 19106 10743 8147 216.0 281.0 1 0.019560
26 HILLSBOROUGH 808 654 152 2 2 1361594 1048235 311399 1959.0 1959.0 7 0.260748
27 HOLMES 156 71 77 8 9 86481 53332 32662 487.0 566.0 3 0.004019
28 INDIAN RIVER 122 84 37 1 1 145658 82890 62621 147.0 147.0 4 0.003863
29 JACKSON 146 72 72 2 3 77392 47785 29398 209.0 241.0 3 0.001711
... ... ... ... ... ... ... ... ... ... ... ... ... ...
34 LEON 152 86 64 2 2 103547 63816 39518 212.0 212.0 3 0.001505
35 LEVY 75 31 36 8 8 18031 5843 10466 1722.0 1722.0 2 0.027808
36 LIBERTY 77 22 49 6 6 25453 6217 18611 625.0 625.0 3 0.004438
37 MADISON 59 31 27 1 1 39957 26896 13061 0.0 0.0 2 0.000000
38 MANATEE 199 133 65 1 1 269545 157364 103091 9090.0 9090.0 1 0.632744
39 MARION 69 55 13 1 1 73228 45865 27364 0.0 0.0 5 0.000000
40 MARTIN 128 95 33 0 0 295820 248523 47297 0.0 0.0 4 0.000000
41 MIAMI-DADE 957 804 146 7 13 1743024 1315449 404966 22609.0 33739.0 6 0.322624
42 MONROE 100 55 39 6 6 527717 198916 257963 70838.0 70838.0 6 0.677376
43 NASSAU 102 50 48 3 3 100719 53027 45884 644.0 644.0 2 0.010400
44 OKALOOSA 216 111 92 13 13 269312 77988 171330 19993.0 19993.0 3 0.141953
45 OKEECHOBEE 65 45 20 0 0 28011 22654 5357 0.0 0.0 1 0.000000
46 ORANGE 754 601 150 3 3 1031407 779028 251598 780.0 780.0 5 0.028880
47 OSCEOLA 236 167 69 0 0 203856 160646 43210 0.0 0.0 5 0.000000
48 PALM BEACH 604 516 81 7 7 805336 647923 120878 36535.0 36535.0 4 0.960083
49 PASCO 123 105 18 0 0 94623 88405 6218 0.0 0.0 7 0.000000
50 PINELLAS 414 314 97 3 3 1185162 989323 190285 5554.0 5554.0 7 0.739252
51 POLK 357 273 82 2 3 228351 207445 20013 893.0 1559.0 1 0.108520
52 PUTNAM 68 25 40 3 3 85997 16783 69008 206.0 206.0 2 0.003327
53 SANTA ROSA 190 97 89 4 5 211179 39776 167685 3719.0 3838.0 3 0.027250
54 SARASOTA 295 230 64 1 2 224007 181891 41825 290.0 1608.0 1 0.111931
55 SEMINOLE 143 126 17 0 0 225280 182245 43035 0.0 0.0 5 0.000000
56 SUMTER 76 59 17 0 1 95533 88154 7379 0.0 665.0 5 0.024622
57 SUWANNEE 37 23 11 3 3 28317 18950 7368 1999.0 1999.0 2 0.032282
58 TAYLOR 94 22 63 9 9 24629 10014 12520 2095.0 2095.0 2 0.033832
59 UNION 28 7 20 1 1 12519 5843 5698 978.0 978.0 2 0.015794
60 VOLUSIA 227 144 76 7 7 326665 164435 147618 14612.0 14612.0 5 0.541025
61 WAKULLA 50 21 23 6 6 17783 10011 7649 123.0 123.0 3 0.000873
62 WALTON 219 125 86 8 9 104108 69478 34056 575.0 627.0 3 0.004452
63 WASHINGTON 96 57 34 5 6 72116 46988 23224 1904.0 3220.0 3 0.022862

64 rows × 13 columns

We can use idxmax to get the index of the highest value in this new column, and the answer to part (2).

[9]:
bridges_2.loc[bridges_2['SD Area Share in District'].idxmax()]
[9]:
County                       PALM BEACH
Total #                             604
Good #                              516
Fair #                               81
Poor #                                7
SD #                                  7
Total Area                       805336
Good Area                        647923
Fair Area                        120878
Poor Area                         36535
SD Area                           36535
District                              4
SD Area Share in District      0.960083
Name: 48, dtype: object

Palm Beach County only has a few structurally deficient bridges, but they are big ones, including two that span the intracostal waterway.