Broadband Coverage Mapping of New York State
A Tool to Analyze Broadband Connectivity in New York State
Problem Statement
In light of increased remote activity during COVID-19, federal aid has been made available to help close the gap in broadband access, with President Biden pledging billions to improve broadband coverage options and affordability for all. However, allocation of these resources proves difficult since there is no singular source of truth for the measure of broadband coverage or how coverage strength and options in certain areas compare to options available elsewhere.
Project Description
Our capstone project aggregates different datasets at various spatial levels to create a master dataset at the census tract level containing information on broadband coverage, all of which are meant to better inform future deployment efforts. We used a variety of supervised learning models to understand the relationship between demographics and broadband connectivity. We also built a proprietary scoring system to determine census tracts which are well-served, unserved and underserved.
The end product is a broadband coverage map that will be of use to:
-
Policymakers in charge of broadband deployment efforts and the pursuit of grant funding available from the Federal Communications Commission (FCC), US Department of Agriculture and future state programs for universal broadband access
-
Digital inclusion nonprofits that try to understand how broadband coverage disparities overlap with various demographics
-
Everyday citizens who can use our open broadband map to determine their region's coverage
This map will provide a singular source of truth on broadband coverage in New York State, and help reach the eventual goal of closing the digital divide.
Methodology
-
Data Engineering
We initially had 6 datasets at various spatial levels, which we brought to the census tract level through spatial interpolation and the creation of specialized crosswalk files. These datasets contained various measures of broadband speed, broadband coverage, demographic data and provider availability. All of these variables were needed to provide a holistic view of broadband coverage throughout the state.
-
Modeling
Using a Random Forest regressor with a R2 score of 0.76, we found the variables most closely correlated with broadband usage to be:
- Number of M-Lab speed tests conducted per census tract
- Minimum round trip time
- Population
- Average loss rate
- Fastest average broadband speed measured
- Percentage of census tract population that uses the Internet at broadband speeds (25 Mbps upload speed / 3 Mbps download speed)
- M-Lab broadband speed
A more detailed description of these variables is available on the Github repository linked below.
-
Broadband Score
Using the above variables, we created a broadband score for each census tract ranging from 1-5. This will hopefully provide policymakers with actionable insight for the pursuit of grants and allocation of investment resources for broadband infrastructure.
-
Data Visualization
We created an interface that would convey our insights to users and allow them to intuitively explore our curated datasets on their own. The visualization is accessible through the image below.
Sponsors
Thank you to our sponsors who provided guidance and support throughout this project.
Schmidt Futures
US Ignite