COVID-19 case numbers have increased in California despite action from state and municipal governments meant to slow the spread of the virus. New governmental policies often involve modifying the activity of economic participants in a way that attempts to reduce physical contact between people. There are challenges to the uniform implementation of social-distancing guidelines across industries that have led to varying approaches to handling these new policies. For example, while it is possible for universities to switch to an online-only format to limit COVID-19 exposure, the farm industry relies on physical labor that must be performed irrespective of whether workers are safe or healthy.
There have been reports of some essential workers feeling unsafe in their working conditions. An assessment of risk across industries could inform policy makers, industry leaders, and members of the general public who may be employees or consumers of a specific industry, on whether additional precautions or guidelines may be necessary to conduct business in a safe and productive manner. By analyzing economic data for the CA Employment Development Department and State of California COVID-19 Health Data (March 2020 - September 2020), we will seek to understand what effect (if any), county labor force distribution may have on COVID-19 case rates. More specifically we will seek to understand the effect that agricultural employment may have on COVID-19 case rates.
View the source code on Github.
Up-to-date COVID-19 data were freely available from the California Department of Public Health on the California Open Data Portal (ca.gov). These data contained the amount of new COVID-19 cases and deaths in each county reported on each day, since 18 Mar 2020. There were a total of about 10,000 rows of entries stored in the csv format, with the following headers:
|Unique identifier of row data||Total count of confirmed COVID-19 cases to date||New count of confirmed COVID-19 deaths on report date||Total count of confirmed COVID-19 deaths to date||Reporting County||New count of confirmed COVID-19 cases on report date||Date of report|
There were inaccuracies in the reporting as some counties inputted negative new counts of COVID-19 cases, possibly to compensate for errant totals from previous reports. By examining the monthly totals, such discrepancies are less likely to be an issue.
Other available datasets included hospitalization, testing, and PPE logistics, but none were able to describe the origin of the cases. Hence, we could not use these data.
Industry employment data for each California county were obtained from the State of California Employment Development Department (edd.gov.ca). These data contain the total number of jobs in each industry reported each month, since Jan 1990. There were a total of about 760,000 rows of entries stored in the csv format, with the following headers:
|Area Type||Area Name||Year||Month||Series Code||Industry Title||Seasonally Adjusted||Current Employment|
|State/County/Metropolitan||Name of area||Calendar year||Calendar month||Code for the specific industry||Official industry name||True/False Seasonal changes applied||Number of jobs|
The previous dataset accounts for the number of employed people in each county, but not the unemployed. Unemployment data from each county were obtained from the State of California Employment Development Department (edd.gov.ca). These data contain the total number of unemployed in each county reported each month, since Jan 1990. There were a total of about 160,000 rows of entries stored in the csv format, with the following headers:
|Area Type||Area Name||Periodyear||Period||Adjusted||Laborforce||Employment||Unemployment|
|State/County/Metropolitan||Name of area||Calendar year||Calendar month||True/False Seasonal changes applied||Civilian labor force||Proportion of the population employed||Civilians without jobs and making specific effort to find a job|
It was noted that there were discrepancies in the numbers reported from each county, as not all counties reported their employment data in 2020, and the total number of jobs do not necessarily tally with the total workforce since some may take more than 1 job.
To produce mapped visualizations of county data, we utilized 2016 TIGER/Line Shapefiles, accessible on the California Open Data Portal (ca.gov).
2019 United States Census Bureau (US Census) Estimates provided data on the total population of each county. These data were used to estimate the COVID-19 infection rate per county. Although the population data were only recorded annually, they remain a relevant estimate since there were no known large population changes during the period of interest.
The above chart depicts an overview of the data processing required for analysis. The analysis objective was to derive insights from the monthly changes in COVID-19 cases and proportion of jobs both at the county and state level. This was achieved by examining if there were any significant correlations between monthly COVID-19 cases and monthly industry jobs (i.e. if COVID-19 cases may be lower or higher than average in a particular industry). All data were in the csv format and parsed into dataframes using the Pandas Python package, with the exception of the COVID-19 data which were parsed into a dataframe from an API. The data were then merged according to common county names and dates. Fourteen counties (California has 58) did not report data in 2020, so they were excluded from industry data calculations. This does not apply to unemployment data, however, which were reported for every county in the timeframe of interest.
Scatter plots, bar plots and choropleth maps were subsequently generated using the processed dataframes.
- Scatter plots were chosen for analyzing the correlation between COVID-19 cases and % of jobs in a particular sector across the various counties. A strong correlation would indicate a possible relationship between industry activity and COVID-19 infection rate.
- Bar plots were chosen for analysing the distribution of workforce across the various industries, which would help to quantify differences in jobs/COVID-19 cases between different industries.
- Choropleth maps helped to display geospatial distribution of the workforce and COVID-19 cases. These maps can be used to identify any clusters where COVID-19 cases or certain industries may be concentrated.
All the bar, line and scatter charts below were generated using Plotly to enable mouse-over queries and sliders to switch between different months and charts. Choropleth maps were generated using Folium and displayed in html using Leaflet. A few HTML tweaks were required to calibrate the proper display bounds and zooming parameters that would enable the smoothest user experience.
Figure 1: COVID-19 Cases per 100k in California (CA)
Case rates in California have been increasing. We want to examine the relationship between the case rates and the proportion of industry in each county.
Figure 2: Proportion of Industries in CA, Weighted by CA COVID-19 Cases (slide right)
We calculated industry proportions for the reporting counties using the industry job totals of each county (Figure 2, Left). Given these data, we wanted to test our hypothesis that COVID-19 cases would be higher in counties containing larger proportions of certain industries. To do this, we calculated COVID-19 weighted industry proportions. These proportions represent the percentage of total COVID-19 cases that can be accounted for by the industry employment proportion of each county. In essence, the key question is: if all COVID-19 cases could be evenly distributed across industry employment and employment status, given the varying industry employment proportions and COVID-19 case rates of counties, how many cases can be attributed to each industry or employment status? COVID-19 weighted proportions were calculated by taking the industry proportions of each county, multiplying them by the case numbers of each county, and then summing by industry before creating new proportions (Figure 2, Center). To highlight any changes between the two proportions, we calculated and plotted the difference of the two proportions (Figure 2, Right).
Figure 3: Change in Proportion of Industries After Weighted by COVID-19 Cases - Sorted by Mean
Looking at the changes in covid-weighted industries, we noted that essential services (such as transportation and warehousing) stayed positive throughout this period, indicating potential increased contributions of new COVID-19 cases from counties that relied on such industries. In addition, some of the industries showed reduction in case contributions after weighting such as Professional/Business services, possibly due to the introduction of work from home arrangements. The largest increases in case contribution after weighting came from Total Farm and the unemployed. The data suggest that Total Farm began with low COVID-19 cases, and started to climb over the months, possibly due to the vulnerability of farm workers and lack of protective coverings (e.g. masks) as mentioned in the Guardian article. The data unexpectedly showed unemployment as a significant contributor to new COVID-19 cases. Unemployment increased as a consequence of lockdown measures imposed as a result of COVID-19, and unemployed people may be more likely to make trips away from home possibly as a result of having additional time or the pressure of providing income.
Figure 4: R Values (Pearson's Correlation) Correlated with COVID-19 Cases per 100k
|Total Farm||Unemployed||Leisure & Hospitality||Construction||Durable Goods||Educational & Health Services||Financial Activities||Govt.||Information|
|Manufacturing||Natural Resources, Mining & Construction||Nondurable Goods||Other Services||Professional & Business Services||Retail Trade||Trade, Transp. & Utilities||Transp., Warehousing & Utilities||Wholesale Trade|
We calculated the Pearson's correlation coefficient (r) of each county's COVID-19 case and laborforce statistics using the scipy.stats Python module. After sorting by largest r values, we identified the categories with the strongest correlations to COVID-19 case rates, defined here to mean an r value greater than 0.5. Leisure & Hospitality had an r value greater than 0.5 for only one of the months in the period of interest and was not examined in further detail. It should be noted however, that the leisure and hospitality industry was the industry with the most job losses during the onset of the pandemic (Bureau of Labor Statistics). The sudden onset of travel restrictions and industry layoffs may be related to the initial moderate correlation and subsequent weak correlations calculated in this analysis. We visualized correlations below between the only two categories with moderate correlations across multiple months, Total Farm and Unemployed.
Figure 5a: COVID-19 Case Rate and Total Farm Laborforce of CA Counties - Scatter Plot
We identified a moderate correlation between Total Farm labor force and case rates in California counties from July to September. This is consistent with our null hypothesis and (anecdotally) consistent with the Guardian article linked in the introduction that reported on the agricultural sector's risk of COVID-19 infection.
Figure 5b: COVID-19 Case Rate and Total Farm Laborforce of CA Counties, September 2020 - Choropleth
Figure 6a: COVID-19 Case Rate and Unemployment Rate of CA Counties - Scatter Plot
There is a moderate correlation between county unemployment and case rates from May to July. Causality may work both ways in this case, as higher case rates may cause more business closures leading to higher unemployment; or high unemployment leading to more job-seekers taking odd-jobs/gig-work that involve public interaction or leaving home for public assistance.
Figure 6b: COVID-19 Case Rate and Unemployment Rate of CA Counties, July 2020 - Choropleth
“You can appear to contain the spread among middle-class workers but when it reaches those workers who are furthest on the margins, who are most disadvantaged, the virus is going to spread” - Edward Flores, sociology professor at the University of California, Merced
It may be the case that agricultural workers and the unemployed are at greater risk of contracting COVID-19. This information could prove informative to decision makers, industry leaders, and the general public alike for implementing measures to prevent the spread of COVID-19, and appropriately prioritizing resources to those most at-risk of infection. More research can be done to determine if industry employment is of significance when taking other important demographics into consideration such as age, or the population density of an area. Another area of interest to future researchers may be case rates specifically for California's homeless population.