Loan list

This link will let you download the full list of loans for the metro area. This might end up being a little big for your computer, depending on the size of the area. The largest, New York, has 229,000 rows. But it gets smaller very quickly. The fifth-largest, Washington, DC, has about 75,000.

Here is a summary by zip code area, which you can search for names of post offices. There are additional columns giving you a little more feel for the demographics in the downloadable version. There will be some small zip codes that will have really weird answers because the count of USPS businesses is less than the count of the number of loans.

(Note: There are 479 extra business addresses across the U.S. buried somewhere in this data file, out of 12.6 million. I believe it comes from rounding the number of vacancies when allocating tracts to zip codes, but I’m not sure. I decided not to worry about it. It’s a 0.004 percent error.)

Local map

How to read this map:

Each area is a zip code area – it may include some postal Zip Codes that are single buildings or companies located there. It’s shaded by an approximate percentage of businesses that got PPP loans in both rounds, through December 2020. The percentage is based on the estimate of USPS business addresses that were not considered vacant as of December 2019.

The dots show you what the dominant ethnic group of the zip code area is.

Clicking on the area gives you a little more information about it, including median income (when available), the number of loans in each round, and the postal name of the zip code.

Lenders

This section provides some data on the lenders in an area. All of the lenders with at least 50 loans in this area are listed separately, and the rest are grouped into a “999” group.

Here is the portion of loans in this area taken up by lenders by size:

This table provides some metrics for each lender:

How to read it:

For each lender, there is a percentage of the loans for each type of zip code by ethnicity. For example, if “White %” says “35%” for Chase, it means that 35 percent of Chase’s loans in this market were to majority White zip codes. This is reapeated for any type of zip code with at least 5 percent of the market’s loans. They won’t add to 100% because you are not seeing the ones that were under 5% of the market.

Then there are two sets of comparisons for each group: The percent of loans compared with the average for that market, and a comparison to the distribution if it were the same as the underlying businesses from the USPS. In those columns, anything over 1.0 means that the group is over-represented. Anything under 1.0 means that it’s under-represented.

For example, if the number for Chase under “Hispanic / Latino” is 1.1, it means that Chase is 10 percent more likely to have loans in Hispanic / Latino areas than the rest of the market. If it’s .9, it means that Chase is 10 percent less likely to have loans in that area.

Prepared by Sarah Cohen, December 22, 2020

---
author: "Sarah Cohen"
date: "`r Sys.Date()`"
output: 
  html_document:
    theme: united
    df_print: paged
    code_download: true
params:
  metro: 31080-31084
  city: Los Angeles
  printcode: false
  csv_link: "https://www.dropbox.com/s/myo1tbisnpxq805/ca_los_angeles.csv?dl=1"
  root_name : ca_los_angeles
title: "PPP Report for `r params$city`"
---

<!-- The beginning of this R Markdown sets up the libraries needed along with the parameters for the rest of the program. 
     I have already created separate csvs of each metro area, based on the names matched up with the metro codes. So once
     I know which metro code I want, this will run the standard stuff for that metro code. 
     
     Ideally, I would set this up so that you could just create in advance all of the metros, but I just don't know how to do that right now.
     I know there's a way so I can figure it out if we need to.
-->


```{r setup, include=FALSE}
## be careful with the cache because it sometimes gets messed up. 
knitr::opts_chunk$set(echo = params$printcode, message=FALSE, warning=FALSE, cache=FALSE)

# for maps
library(leaflet)
library(sf)
library(tigris)
library(leaflet.extras)
library(leaflet.providers)
library(classInt)



# normal
library(tidyverse)
library(janitor)
library(scales)
library(DT)
# for complex search-and-replace to get titles of headings right
library(stringi)

#tigris can take a while
options(tigris_use_cache=TRUE, dplyr.summarise.inform=FALSE, scipen=999)

#this has everything else we need, I think. 
load ("data/zcta_master_map.Rda")


```



## Loan list

This link will let you [download the full list of loans for the metro area](`r params$csv_link`). This might end up being a little big for your computer, depending on the size of the area. The largest, New York, has 229,000 rows. But it gets smaller very quickly. The fifth-largest, Washington, DC, has about 75,000. 


Here is a summary by zip code area, which you can search for names of post offices. There are additional columns giving you a little more feel for the demographics in the downloadable version. There will be some small zip codes that will have really weird answers because the count of USPS businesses is less than the count of the number of loans. 

```{r}

zcta_map %>%
  as.data.frame() %>%
  filter ( metro_code == params$metro) %>%
  arrange ( zcta) %>%
  mutate (`Name` = paste(zcta_name, st_abbrev, sep=", "), 
          `% bus w/ loans` = loans / bus_novacant) %>%
  select ( `Zip code` = zcta, 
           `Name` ,
           `Ethnicity` = zcta_ethnic,
           `Loans` = loans,
           `% bus w/ loans`,
           `Round 1` = loans_r1, 
           `Round 2` = loans_r2, 
           `Amount` = loan_amount,
           `USPS Non-vacant businesses` = bus_novacant, 
           `Non-vacant addresses` = all_novacant,
           `Census businesses` = zbp_bus, 
           `Population` = tot_pop,
           `Households` = households,
           `Pct White non-Hispanic category` = white_nh_group, 
           `White %` = pct_white_nh, 
           `Black %` = pct_black_nh, 
           `Hispanic / Latino %` = pct_hispanic,
           `Asian %` = pct_asian_nh, 
           `Native American %`= pct_aian_nh, 
           `Median income` = median_inc_2018, 
           `% college grad` = pct_college_grad, 
           `Unemployment rate avg 2014-18` = unempl_rate) %>%
  # glimpse()
   datatable(
            rownames=FALSE,
            extensions="Buttons", 
            options = list (
               autoWidth = TRUE, 
               scrollX = TRUE,
               scrollY = TRUE,
               columnDefs = list ( list ( width='150px', targets=c(1,2,13)) ,
                                   list ( visible=FALSE, targets=c(9, 11:13, 18, 20:21))
                                   ) , 
               order = list (list (0, "as") ), 
               #dom = "flBrtip", 
               dom = '<"top"Bif>rt<"bottom"lp><"clear">',
               buttons = c ("copy", "csv", "excel")
            )
) %>%
  formatRound (c(4, 6,7, 9, 10, 11, 12, 13 ), digits=0, mark="," , interval=3) %>%
  formatCurrency ( c(8,  20), digits=0) %>%
  formatPercentage( c( 5, 15:19))

  


```

                                   
(Note: There are 479 extra business addresses across the U.S. buried somewhere in this data file, out of 12.6 million. I believe it comes from rounding the number of vacancies when allocating tracts to zip codes, but I'm not sure. I decided not to worry about it. It's a 0.004 percent error.) 

## Local map

<!--Already loaded the object from tigris and sf, so the zcta_map object is already a multipolygon simple feature.-->


<!-- Step 2: Pull out just the metro area you want right now -- that's will be a function of the metro areas from the list in the previous step. 
After this, we'll have the `approx_center` of the map, the `span_bounds` of the map, and the `breaks` based on Jenks rounded down to the nearest 5 %.  We can also set the palette now that we know the breaks. It sets the palettes as `discrete_pal` using all levels of the ethnicity from the national file; and `blue_pal` as binned from the jenks breaks.  It also creates a point layer for the metro area. 
-->


```{r get_metro_base, warning=FALSE}

metro_map_base <- filter ( zcta_map, metro_code == params$metro)  %>%
  mutate ( pct_loan = round ( pmin ( loans / bus_novacant , 1) , 3 ) )

metro_map_pt <-  st_centroid ( metro_map_base) 


# now get two measures: The bounding box of the map and the jenks breaks
box <- st_bbox (metro_map_base)
approx_center <-
   c ( mean (c(box[1], box[3])), 
         mean(c(box[2], box[4])))

# putting a pretty big range around this -- won't end up out in Asia, but might have to pan a little! 
span_bounds = unname( c (box['xmin']-1, box['ymin']-1, box['xmax']+1, box['ymax']+1))

# jenks breaks - round to 5 groups? 
jenks <- classInt::classIntervals(metro_map_base$pct_loan, n=5, style="jenks")$brks
# ( floor (.75 / .05) * .05) no idea why this works but it does
breaks <- ( ceiling (jenks/ .05) * .05)
#brks <- c( 0, .25, .4, .55, .8, 1)


blue_pal <- colorBin (palette="Blues", 
                     domain = metro_map_base$pct_loan, 
                     breaks,
                     na.color = "white")

#set the colors for the circles based on the larger dataset so they're always the same. 

discrete_pal <- colorFactor(palette="Dark2", domain = zcta_map$zcta_ethnic)

```


<!--Now create the map. -->

How to read this map: 

Each area is a zip code area -- it may include some postal Zip Codes that are single buildings or companies located there. It's shaded by an approximate percentage of businesses that got PPP loans in both rounds, through December 2020.  The percentage is based on the estimate of USPS business addresses that were not considered vacant as of December 2019. 

The dots show you what the dominant ethnic group of the zip code area is. 

Clicking on the area gives you a little more information about it, including median income (when available), the number of loans in each round, and the postal name of the zip code. 


```{r leaflet_map}

m <- leaflet ( metro_map_base) %>%
     addProviderTiles(providers$Esri.WorldStreetMap, 
                      options = providerTileOptions( minZoom = 7))  %>%
     setView( approx_center[1], approx_center[2],  zoom=10) %>%
     setMaxBounds (span_bounds[1], span_bounds[2], span_bounds[3], span_bounds[4] )%>%
     addPolygons ( fillColor = ~blue_pal( pct_loan),
                   #col="#78200E", 
                   stroke =1, 
                   weight=.3, 
                   fillOpacity = .7 , 
                  popup = ~ paste0 ( "Zip Code ", zcta , ", ", zcta_name, ",  ", county_name, "<br>", 
                                    "Zip code ethnicity: " ,  zcta_ethnic, 
                                    "<br>", "% of non-vacant USPS businesses with loans: " ,
                                     percent( pct_loan, accuracy=.1) , 
                                    "<br>", "Loans : Round 1-  ", loans_r1, ", Round 2 - ",  loans_r2, 
                                    "<br> Median Income " , dollar( median_inc_2018))) %>%
     addCircleMarkers ( data= metro_map_pt, 
                        radius = 4, 
                        #radius = ~sqrt( loans / 40) , 
                        color= ~discrete_pal (zcta_ethnic), stroke=FALSE , fillOpacity = .8, 
                     group = "circles") %>%   
  addResetMapButton() %>%
  addLegend ("topright", pal=discrete_pal, values = ~zcta_ethnic, group="circles", title="Ethnicity") %>% 
  addLegend ("bottomleft", pal=blue_pal, 
                           values = ~pct_loan, 
                           labFormat = labelFormat (suffix="%", 
                                                    transform = function (x) {100*x}) , 
                           title = "% of USPS active bus.")
  
m

```

<!-- clean up environment so it's not confusing later -->

```{r}

rm (list = c('m','zcta_map', 'metro_map_pt', 'approx_center', 'box', 'breaks', 'jenks', 'span_bounds', 'blue_pal', 'discrete_pal') ) 

```


## Lenders

<!-- this reads in the csv file from the saved ones, which is much faster than loading all of the loans from the original. Need to fix the treatment of zip codes and naics into character.-->

This section provides some data on the lenders in an area. All of the lenders with at least 50 loans in this area are listed separately, and the rest are grouped into a "999" group.  

<!--
Now I have to automate getting the lenders into a table with an export to Excel button. Instead of calculating market share, which is a really small number for each lender, I'm comparing the percent of loans from the lender by group to the percent in the region. That comes out to the same difference in market share, but it's a little easier to read. 

For each metro area, I'm selecting any lender with at least 100 loans. There aren't that many of them, and they usually make up about 95 percent of the loans in the area. Everything else is being put into an "everyone else" category. 

It would be great to be able to generate this output as HTML Widget to save rather than have to redo it for each since the loan data takes so long to read in. (Better to add the widget to an existing rmd than to do it as its own rmarkdown.)
-->

Here is the portion of loans in this area taken up by lenders by size: 


<!-- Something I have to do: set up a key for the ethnicities so that they create good variable names later on -->

```{r}

ethnic_key <- c( "00 - No people"  = "No people", 
                 "01 - Majority white" = "White", 
                 "02 - No distinct minority" ="No distinct minority", 
                 "03 - Black" = "Black", 
                 "04 - Hispanic" = "Hispanic / Latino",
                 "05 - Asian" = "Asian" ,
                 "06 - Native American" = "Native American")



```





```{r pick_lenders}

area_loans <- read_csv(paste0("data/metro_csvs/", params$root_name, ".csv"), 
                       col_types= cols (.default= col_guess(), 
                                       zcta = "c", 
                                       zip = "c", 
                                       naics_code = "c"))



lender_ranks <- 
  area_loans %>%
  group_by (lender) %>%
  summarise ( lender_loans = n()) %>%
  mutate ( rank = min_rank ( -lender_loans), 
           lender_group = if_else ( lender_loans >= 50, rank, as.integer(999)), 
           lender_id = ordered( paste( sprintf ("%03d", lender_group), 
                                      if_else (lender_group == 999, "Small lenders", lender),
                                      sep=" - ")))



lender_ranks %>%  
  arrange (-lender_loans) %>%
  group_by (bin = cut (lender_loans, breaks=c (0, 50, 100, 500, Inf), include.lowest=FALSE, right=FALSE, labels=c("0-49", "50-99", "100-499", "500 & up") )) %>%
  summarise  ("# of lenders" = n(), total = sum(lender_loans))  %>%
  arrange ( -as.numeric(bin)) %>%
  mutate ("Cumulative %" = cumsum(total)/sum(total)) %>%
  rename ("Range" = bin, "# of loans" = total) %>%
  datatable ( options = list(dom="t", width="80%", ordering=FALSE), 
              rownames = FALSE, 
              width="50%") %>%
  formatRound (c(2), digits=2, interval=3, mark=",") %>%
  formatPercentage( c(3,4), digits=1) 

```

This table provides some metrics for each lender: 

* The market share 
* The percentage of loans by ethnicity of the zip codes for each lender, versus those in the market as a whole
* The percentage of loans by ethnicity of the zip codes for each lender versus the number of non-vacant business addresses in the market. 



```{r lender_comparisons}
# set up some more print-friendly column headings 
ethnic_key <- c( "00 - No people"  = "No people", 
                 "01 - Majority white" = "White", 
                 "02 - No distinct minority" ="No distinct minority", 
                 "03 - Black" = "Black", 
                 "04 - Hispanic" = "Hispanic / Latino",
                 "05 - Asian" = "Asian" ,
                 "06 - Native American" = "Native American")

varname_lookup <- c("lender_ethnic" = "% ",  "lender_mkt_rel" = "vs. market", "lender_bus_rel" = "vs. USPS businesses")


area_totals <- 
  metro_map_base %>%
  as.data.frame() %>%
  group_by (zcta_ethnic) %>%
  summarise (area_loans = sum (loans), 
             area_businesses = sum(bus_novacant)) %>%
  mutate ( sum_area_loans = sum(area_loans), 
           sum_area_businesses = sum(area_businesses))

# just check to make sure this is the same as the loans we just picked up. 
# sum(area_totals$area_loans, na.rm=TRUE) == nrow( area_loans)

#don't use lender_loans from the ranks data -- it will changen now that we have the 999's combined.

lender_totals <- 
  area_loans %>%
  inner_join (select (lender_ranks, lender, lender_id) , by=c("lender"="lender")) %>%
  group_by (lender_id, zcta_ethnic) %>% 
  summarise ( lender_loans = n() , .groups="drop_last") %>% 
  mutate ( sum_lender_loans = sum(lender_loans))

lender_raw_counts <- 
   area_totals %>%
   left_join ( lender_totals , by=c("zcta_ethnic" = "zcta_ethnic")) %>%
   ungroup() %>%
   arrange (lender_id, zcta_ethnic) %>%
   mutate ( mkt_share = sum_lender_loans/ sum_area_loans , 
            lender_ethnic = lender_loans / sum_lender_loans, 
            mkt_ethnic = area_loans / sum_area_loans, 
            bus_ethnic = area_businesses / sum_area_businesses, 
            lender_mkt_rel = lender_ethnic / mkt_ethnic, 
            lender_bus_rel = lender_ethnic / bus_ethnic, 
            # this renames the ethnicities so that when we flip the data frame it will 
            ethnic = recode (ordered(zcta_ethnic), !!!as.list(ethnic_key )) 
   )
            

num_ethnic_groups <- lender_raw_counts %>% filter( mkt_ethnic >= .05) %>% distinct( zcta_ethnic) %>% summarise(n()) %>% as.numeric()
            
## Flip this and keep the things we care about 
## This is the only way I know how to make the names work out correctly: 
pivoted_loans <- 
  lender_raw_counts %>%
  filter (mkt_ethnic >= .05) %>%
  select ( lender_id, zcta_ethnic, sum_lender_loans, 
           mkt_share , lender_ethnic, lender_mkt_rel, 
           lender_bus_rel) %>%
  pivot_longer (cols  = c(lender_ethnic: lender_bus_rel) , names_to = "vname", values_to= "pcts") %>%
  mutate (vname2 = stri_replace_all_fixed(vname, names(varname_lookup), unname(varname_lookup), vectorise_all = FALSE ), 
          ethnic = stri_replace_all_fixed( zcta_ethnic, names(ethnic_key), unname(ethnic_key), vectorise_all=FALSE), 
          new_vname = paste(ethnic, vname2, sep=" ")
) %>%
  select (-vname2, -ethnic) %>%
  pivot_wider ( id_cols=c(lender_id, mkt_share, sum_lender_loans), values_from=pcts, names_from=new_vname, values_fill=0 ) %>%
  select (Lender = lender_id, "Loans" = sum_lender_loans , `Market share`=mkt_share, ends_with("% "), ends_with("vs. market"), ends_with("vs. USPS businesses")) 

 
```

How to read it: 

For each lender, there is a percentage of the loans for each type of zip code by ethnicity. For example, if "White %" says "35%" for Chase, it means that 35 percent of Chase's loans in this market were to majority White zip codes. This is reapeated for any type of zip code with at least 5 percent of the market's loans. They won't add to 100% because you are not seeing the ones that were under 5% of the market.

Then there are two sets of comparisons for each group: The percent of loans compared with the average for that market, and a comparison to the distribution if it were the same as the underlying businesses from the USPS. In those columns, anything over 1.0 means that the group is over-represented. Anything under 1.0 means that it's under-represented. 

For example, if the number for Chase under "Hispanic / Latino" is 1.1, it means that Chase is 10 percent more likely to have loans in Hispanic / Latino areas than the rest of the market. If it's .9, it means that Chase is 10 percent less likely to have loans in that area. 

```{r include=FALSE}

area_loans %>%
  select ( rowid, lender, zcta_ethnic ) %>%
  filter ( str_detect (lender, "Citibank"))


```
```{r}
# build the table

col_ethnic_count <- num_ethnic_groups * 3
pctage_cols <- c(4:(3+num_ethnic_groups))
round_cols <- c( (max(pctage_cols) + 1):(max(pctage_cols) + (2*num_ethnic_groups) ))
max_col_num <- max(round_cols)

pivoted_loans %>%
   datatable( class="compact", 
            rownames=FALSE,
            extensions="Buttons", 
            options = list (
               autoWidth = TRUE, 
               scrollX = TRUE,
               columnDefs = list ( list ( width='150px', targets=c(0)) 
                                   #list ( visible=FALSE, targets=c(0,9, 11:20  ))
                                  ) , 
               order = list (list (0, "asc")) , 
               dom = "Bfrtip", 
               buttons = c ("copy", "csv", "excel")
            ),
           ) %>%
  # this is weird - the stuff in "options" is 0based and the stuff here is 1-based.
  formatRound ( round_cols , digits=2, interval=3, mark=",") %>%
  formatPercentage ( pctage_cols, digits=0) %>%
  formatPercentage ( c(3), digits=2) %>%
  formatRound (2, digits=0, interval=3, mark=",")





```

<div style="size:80%;color:gray;font-style:italic;">
Prepared by Sarah Cohen, `r format(Sys.Date(), "%B %d, %Y")`
</div>