In the beginning months of 2020, COVID-19 case numbers increased in California despite action from state and municipal governments meant to slow the spread of the virus. One prominent government response to the pandemic has involved modifying the activity of economic participants through policies that reduce physical contact between people. There are challenges to the uniform implementation of social-distancing guidelines across industries that have led to varying approaches to handling these new policies. For example, while it is possible for universities to switch to an online-only format to limit COVID-19 exposure, the farm industry relies on physical labor that must be performed in person.
There have been reports of some essential workers feeling unsafe in their working conditions. An assessment of risk across industries could inform policy makers, industry leaders, and members of the general public who may be employees or consumers of a specific industry, on whether additional precautions or guidelines may be necessary to conduct business in a safe and productive manner. By analyzing economic data for the CA Employment Development Department and State of California COVID-19 Health Data (March 2020 - September 2020), we will seek to understand what effect (if any), county labor force distribution may have on COVID-19 case rates. More specifically we will seek to understand the effect that agricultural employment may have on COVID-19 case rates.
The above chart depicts an overview of the data processing required for analysis. The analysis objective was to derive insights from the monthly changes in COVID-19 cases and proportion of jobs both at the county and state level. This was achieved by examining if there were any significant correlations between monthly COVID-19 cases and monthly industry jobs (i.e. if COVID-19 cases may be lower or higher than average in a particular industry). All data were in the csv format and parsed into dataframes using the Pandas Python package. The data were then merged according to common county names and dates. Fourteen counties (California has 58) did not report data in 2020, so they were excluded from industry data calculations. This does not apply to unemployment data, however, which were reported for every county in the timeframe of interest.
Line charts, bar plots, scatter plots, and choropleth maps were subsequently generated using the processed dataframes. All the line, bar, and scatter charts below were generated using Plotly. Choropleth maps were generated using Folium and displayed in html using Leaflet. A few tweaks in Leaflet/HTML were required to calibrate the proper display bounds and zooming parameters.
Case rates in California have been increasing. We want to examine the relationship between the case rates and the proportion of industry in each county.
We calculated industry proportions for the reporting counties using the industry job totals of each county (Figure 2, Left). Given these data, we wanted to test our hypothesis that COVID-19 cases would be higher in counties containing larger proportions of certain industries. To do this, we calculated COVID-19 weighted industry proportions. These proportions represent the percentage of total COVID-19 cases that can be accounted for relative to the industry employment proportion of each county. In essence, the key question is: if all COVID-19 cases could be evenly distributed across each industry employment status, given that industry employment proportions and COVID-19 case rates vary from county to county, how many cases could therefore be attributed to each industry employment status? COVID-19 weighted proportions were calculated by taking the industry proportions of each county, multiplying them by the case numbers of each county, summing by industry, and dividing by the total number of cases (Figure 2, Center). To highlight changes between the two proportions, we calculated and plotted the difference of the two proportions (Figure 2, Right).
Looking at the changes in covid-weighted industries, we noted that essential services (such as transportation and warehousing) stayed positive throughout this period, indicating potential increased contributions of new COVID-19 cases from counties that relied on such industries. In addition, some of the industries showed reduction in case contributions after weighting such as Professional/Business services, possibly due to the introduction of work from home arrangements. The largest increases in case contribution after weighting came from Total Farm and the unemployed. The data suggest that Total Farm began with lower than expected COVID-19 cases, and started to climb over the months, possibly due to the vulnerability of farm workers and lack of protective coverings (e.g. masks) as mentioned in the Guardian article. The data unexpectedly showed unemployment as a significant contributor to new COVID-19 cases. Unemployment increased as a consequence of lockdown measures imposed as a result of COVID-19, and unemployed people may have been more likely to make trips away from home as a result of having additional leisure time or as a result of searching for work outside the home.
|Total Farm||Unemployed||Leisure & Hospitality||Construction||Durable Goods||Educational & Health Services||Financial Activities||Govt.||Information|
|Manufacturing||Natural Resources, Mining & Construction||Nondurable Goods||Other Services||Professional & Business Services||Retail Trade||Trade, Transp. & Utilities||Transp., Warehousing & Utilities||Wholesale Trade|
We calculated the Pearson's correlation coefficient (r) of each county's COVID-19 case and laborforce statistics using the scipy.stats Python module. After sorting by largest r values, we identified the categories with the strongest correlations to COVID-19 case rates, defined here to mean an r value greater than 0.5. Leisure & Hospitality had an r value greater than 0.5 for only one of the months in the period of interest and was not examined in further detail. It should be noted however, that the leisure and hospitality industry was the industry with the most job losses at the onset of the pandemic (Bureau of Labor Statistics). The sudden onset of travel restrictions and industry layoffs may be related to the initial moderate correlation and subsequent weak correlations calculated in this analysis. We visualized correlations below between the only two categories with moderate correlations across multiple months, Total Farm and Unemployed.
We identified a moderate correlation between Total Farm labor force and case rates in California counties from July to September. This is consistent with our null hypothesis and (anecdotally) consistent with the Guardian article linked in the introduction that reported on the agricultural sector's risk of COVID-19 infection.
There is a moderate correlation between county unemployment and case rates from May to July. Causality may work both ways in this case, as higher case rates may cause more business closures leading to higher unemployment; or high unemployment may cause more job-seekers to take odd-jobs/gig-work that involve public interaction or to leave the home for public assistance, potentially exposing themselves or others to the virus.
"You can appear to contain the spread among middle-class workers but when it reaches those workers who are furthest on the margins, who are most disadvantaged, the virus is going to spread" - Edward Flores, sociology professor at the University of California, Merced
It may be the case that agricultural workers and the unemployed are at greater risk of contracting COVID-19 than those of employment in other industries. This information could prove informative to decision makers, industry leaders, and the general public alike for implementing measures to prevent the spread of COVID-19, and appropriately prioritizing resources to those most at-risk of infection. More research can be done to determine if industry employment is of significance when taking other important demographics into consideration such as age, or the population density of an area. Another area of interest to future researchers may be case rates specifically for California's homeless population.
View the source code on Github.
Up-to-date COVID-19 data were freely available from the California Department of Public Health on the California Open Data Portal (ca.gov). These data contained the amount of new COVID-19 cases and deaths in each county reported on each day, since 18 Mar 2020. There were a total of about 10,000 rows of entries stored in the csv format, with the following headers:
|Unique identifier of row data||Total count of confirmed COVID-19 cases to date||New count of confirmed COVID-19 deaths on report date||Total count of confirmed COVID-19 deaths to date||Reporting County||New count of confirmed COVID-19 cases on report date||Date of report|
There were inaccuracies in the reporting as some counties inputted negative new counts of COVID-19 cases, possibly to compensate for errant totals from previous reports. By examining the monthly totals, such discrepancies are less likely to be an issue.
Other available datasets included hospitalization, testing, and PPE logistics, but none were able to describe the origin of the cases. Hence, we could not use these data.
Industry employment data for each California county were obtained from the State of California Employment Development Department (edd.gov.ca). These data contain the total number of jobs in each industry reported each month, since Jan 1990. There were a total of about 760,000 rows of entries stored in the csv format, with the following headers:
|Area Type||Area Name||Year||Month||Series Code||Industry Title||Seasonally Adjusted||Current Employment|
|State/County/Metropolitan||Name of area||Calendar year||Calendar month||Code for the specific industry||Official industry name||True/False Seasonal changes applied||Number of jobs|
The previous dataset accounts for the number of employed people in each county, but not the unemployed. Unemployment data from each county were obtained from the State of California Employment Development Department (edd.gov.ca). These data contain the total number of unemployed in each county reported each month, since Jan 1990. There were a total of about 160,000 rows of entries stored in the csv format, with the following headers:
|Area Type||Area Name||Periodyear||Period||Adjusted||Laborforce||Employment||Unemployment|
|State/County/Metropolitan||Name of area||Calendar year||Calendar month||True/False Seasonal changes applied||Civilian labor force||Proportion of the population employed||Civilians without jobs and making specific effort to find a job|
It was noted that there were discrepancies in the numbers reported from each county, as not all counties reported their employment data in 2020, and the total number of jobs do not necessarily tally with the total workforce since some may take more than 1 job.
To produce mapped visualizations of county data, we utilized 2016 TIGER/Line Shapefiles, accessible on the California Open Data Portal (ca.gov).
2019 United States Census Bureau (US Census) Estimates provided data on the total population of each county. These data were used to estimate the COVID-19 infection rate per county. Although the population data were only recorded annually, they remain a relevant estimate since there were no known large population changes during the period of interest.