Abstract
Local policymaking is difficult and expensive to research at scale owing to a lack of centralized data, despite the basic role of American local governments for service provision in sectors such as education and public health. This article presents LOCALVIEW, the biggest current dataset of real-time local government public meetings — the core policy-making process in local government. In all, the dataset comprises 139,616 videos and their related textual and audio transcripts of local government meetings submitted to YouTube – the world’s biggest public video-sharing website – between 2006 and 2022 by 1,012 locations and 2,861 unique governments across the United States. The data are analyzed, downloaded, sanitized, and shared publically (at localview.net) for cross-location and longitudinal study. We validate this dataset using a number of ways and illustrate how it can be used to map the attention of local governments to policy topics of interest. Finally, we explore how journalists, researchers, and other users might utilize LOCALVIEW to comprehend how local communities debate on critical policy concerns, such as climate change, public health, and immigration.
Context and Summary
In the United States, local governments play a significant role in the delivery of services in policy areas such as education, climate change, housing, and public health. A lack of data is a significant obstacle to a more methodical and scientific knowledge of local politics and policymaking. The decentralized, federalist political structure of the United States results in (1) a huge number of local governments (almost 100,000 at last count) and (2) comparatively few centralized data sources on their policies or operations. Although forming the vast majority of elected politicians, governing bodies, and political choices in the United States, the absence of big pre-existing data sources has limited the breadth of the study of local politics based on the data that academics have been able to collect1,2,3,4. The few large-scale datasets that do exist on local governments, which are typically government releases such as the US Census of Governments for municipalities or the Common Core for school districts from the National Center for Education Statistics, may contain structural and administrative characteristics, but are insufficient for scholars interested in topics such as policy-making and deliberations. A recent surge of datasets in the social sciences has led to unprecedented, large-scale studies of national5,6,7,8, and state-level U.S. politics, elections, and policymaking5,6,7,8,9,10. In contrast, the majority of modern studies on local policymaking depend mostly on case studies or small sets of particular locations11,12, laboratory experiments13, or have needed substantial (and costly) manual data collection14,15,16,17.
Local governments, such as city councils and school boards, rely mostly on public meetings for policymaking. Votes on the majority of municipal policies must be held in public, and open meeting “sunshine laws” in all 50 states normally let members of the public to address local leaders with comments and questions. Local government meetings, and hence local policymaking, are exceedingly difficult to examine at scale. In City Limits, a seminal study of local politics, political scientist Paul Peterson laments that “there is nothing like the Congressional Record”18 – a complete transcript of proceedings on the House floor – for local politics, making large-scale study of public meetings tedious or impossible for academics, journalists, and the general public. Hence, current efforts to systematically analyze local gatherings necessitate an intensive and expensive investment in data collection2,14,16,19.
Nowadays, students of municipal policy-making are frequently required to physically gather individual meeting recordings, which are frequently available online as “minutes.” The lack of a consistent format for published meeting records (some are direct transcripts, some are summaries; some will contain the names of public commenters, others will not, etc.) necessitates challenging choices regarding how they should be classified and compared. See Appendix Table A4 for a comparison between summary meeting minutes and a video transcription. Einstein et al. manually collected meeting minutes from 97 cities and towns in Massachusetts between 2015 and 2017 for their ground-breaking study of housing and land use politics on planning board meetings14. This was made possible, in part, by Massachusetts’s exceptionally detailed open meeting law governing written meeting records. Even with sufficient funding and time, the lack of standardization in data formats across locations creates additional problems, prompting some to crowdsource collection and cleaning tasks2 or conduct lab experiments simulating local government participation by exposing recruited populations to pre-recorded segments13.
This article introduces LOCALVIEW, a dataset of local government public meetings designed to facilitate research on local policymaking. LOCALVIEW is additionally distinctive in that it is one of the biggest public datasets including examples of political communication between people and their government officials20,21,22,23, a topic that is of great interest to political scientists20,21,22,23. Beyond the specifics of US local policy-making, we believe that LOCALVIEW can be a useful resource for a variety of social science topics, such as the study of deliberative democracy24,25,26, interpersonal communication27, and intergroup dynamics along partisan28,29, racial30,31,32, geographic33,34 or other dimensions. This dataset can assist researchers, journalists, and other observers of municipal politics and governance in four significant ways. First, LOCALVIEW enables for the unparalleled examination of local gatherings. With over one hundred thousand movies in 49 states, viewers can investigate phenomena of personal interest in a variety of localities and counties (see Fig. 1 for a map of the present coverage). Second, LOCALVIEW is exceptional in its capacity to help in time-based studies. As described in greater detail below, we find that once localities begin posting meeting videos, they post the vast majority of future meeting videos as well, thereby facilitating analyses that utilize data collected over an extended period of time. Thirdly, the uniformity of meeting transcripts in LOCALVIEW enables cross-locality comparisons. LOCALVIEW data, as opposed to locally transcribed meeting minutes, records every word as it was spoken in the meeting. Lastly, our automated data gathering and processing pipeline enables LOCALVIEW to be a dynamic data source. This self-updating feature will expand our coverage both over time and across locations (as our existing cities upload new videos) (as new cities begin posting meeting videos online). Moreover, these features of LOCALVIEW make it appealing to social science researchers who use video13,35,36, audio37,38, and text39,40 data sources.
Methods
Figure 2 summarizes the creation of LOCALVIEW and illustrates some example usage scenarios. This section elaborates on each step of creation
Step 1 is a U.S. Census Bureau list of incorporated places. We limit this list to entries having a valid place or county subdivision FIPS code, which the Census Bureau and others use to identify US locations. The YouTube Data API searches for each entry. Since we do not know the type of municipal government in each place, we individually query the place name prepended to each possible municipal government type (e.g., “Jacksonville city council,” “Jacksonville board of selectmen”). Later in step 2, we identify the exact government type for each valid meeting video. YouTube returns almost 2 million videos and 2,000 channels.
Step 2 identifies local government channels and public meeting footage. Following are the channel- and video-level stages. Then, we carefully check each channel for local government content (or governments). We mostly discover municipal governments—government entities whose jurisdiction is a location (not a county) and whose aim is to function as the locality’s principal legislative body—but we also keep records about county boards, school boards, and other special committees. These government agencies are kept in LOCALVIEW because their domains overlap municipalities in Step 1.
We then manually remove invalid channels and movies. Invalid channels publish less than five meetings, selectively post sessions, or only post meeting clips (rather than entire meeting proceedings). Due to the enormous number of channels uploading single films or snippets, we set a lower criterion for inclusion in the sample. As our exploratory investigation found these movies to be low-quality and manipulated, we excluded them from our sample. We only save uploads that are public meetings held by a local government that specify both I a government name and (ii) a date in their title or description.
Lastly, we identify each video’s location, government entity, and channel type (e.g., government-hosted vs. media outlet). Each query’s search results may not match the FIPS code sought for. A channel’s videos may match numerous municipalities from Step 1 or counties and metro regions not in our list (such as a local media outlet that uploads video of city council meetings in multiple towns). We map video groups to FIPS codes via text parsing and human inspection. Videos from a channel that reference a municipality or county in their title, description, or content are matched to it. Where feasible, we manually searched Google and Wikipedia to link each meeting video to its government. We categorize channels as official government channels, media organizations, or public interest groups using string matching and manual checks. See Appendix Section A for sample makeup, collecting, and government and channel kinds.
After verification and identification (see Technical Validation), we obtain the video files, metadata (likes, dislikes, views), and transcribed video caption text (if published by the channel or automatically given by YouTube) for all valid videos. We use YouTube’s 90% transcription rate. Step 3 combines this with the parsed information from the previous phase (government type, FIPS code), the meeting date, and some common place-level features. 80% of meetings are posted within three days, and 90% within two weeks. Technical Validation describes our manual examination of this technique. LOCALVIEW is complete and ready to use.
Logs
Harvard Dataverse hosts the whole LOCALVIEW database at https://doi.org/10.7910/DVN/NJTBEM41. The dataset may be accessible in RDS, Parquet,.dta,.csv, and.json formats (for easy access with any programming language, such as R, Stata, or Python). Meeting videos are each observation.
Statistical programming languages that accept our formats can read LOCALVIEW as a dataset. LOCALVIEW does not require any extra software, however researchers may benefit from text analysis or mapping applications. This is an example of LOCALVIEW analysis a researcher might undertake.
For features, we keep location information (city, state, and FIPS code); the meeting date; the meeting posting date; the (approximate) date the video was scraped and ingested into the database; YouTube metadata including the URL, channel name, view count, video description, likes, dislikes, favorites, and comments; if the video was livestreamed or posted; and the entire transcript, if available. Importantly, these metadata values during scraping may change over time (e.g. a video may gain likes over time). This database will update timestamped information for time-varying fields in future editions. To standardize comparisons of variables like likes and dislikes, users should include the recording time point relative to each video’s posting. For simplicity and to demonstrate Step 4 in Fig. 2, we link numerous political and geographic variables to our dataset by FIPS code, including federal and state election outcomes across time and population demographics. FIPS codes, a standardized geographic variable used in datasets like the American Community Survey, may help users blend LOCALVIEW with other datasets. Table 1 shows a selection of these variables with descriptions and database examples.
Certification
Our database’s data linkage (important to Steps 1–3 of the data development process Fig. 2 pipeline) and sample validity are discussed (relevant to Steps 4–5 that are relevant for a particular analysis).
To test our data linkage—how well individual meeting observations match other relevant data—we either personally audit a variable throughout the dataset or across a reliable large sample when comprehensive human examination is impossible. We personally examined all 927 channels to check and rectify host type (if discernable) and FIPS codes for videos where a channel only maps to one place. A complete audit was impossible since most variables of relevance in Table 1 and obtained in Steps 2–3 are video-level. Instead, we do a randomized audit (n = 100 films) and identify parsed meeting date (93%), government type (91%), and municipal or county FIPS code (92%). We checked keyword counts by classified government type for sanity. Compared to a municipal council (the most common government in our sample), the words “zoning” and “planning” are 34% more likely to be mentioned in a video explicitly identified as a planning/zoning board, “school” 22% more likely in a board of education video, and “county” 19% more likely in a county board video.
Researchers may focus on internal or external sample validity, or how well a channel, area, or government is represented in our sample. The aforementioned checks imply that the localities and countries in our sample are not routinely mistaken, and there are no abnormalities in channel, video, metadata record, or transcript counts. See the Supplemental Information for sample size metrics over time, which indicate that LOCALVIEW expands without sudden discontinuities owing to YouTube or author failures. Our choice to limit the sample to channels that upload at least five identifiable meeting recordings protects against unauthorised actors uploading occasional, low-quality films. Internal validity risks persist. For one, channel hosts in our sample may still selectively upload videos (e.g., a government withholding a video due to a meeting event), but by removing invalid or presumably biased hosts in Step 2 of our pipeline, we feel this is limited. In supplemental studies, we show that the lack of transcriptions for various meetings is likely not connected with geographical or populational factors (for instance, 90% of our meeting films had captions). Current YouTube transcription algorithm evaluations provide meetings with captions having low systemic error42.
User analysis determines LOCALVIEW’s external validity. As seen in Fig. 1, the average size of locations we cover is greater, and many of America’s largest cities and tiniest villages don’t record their meetings on YouTube. Users can weight their analytic sample using statistical methods like raking or post-stratification with our identifiers like the FIPS code to better estimate the intended population. A simple raking technique reduces sample skew on residential population size and important ethnic demographics, as shown in the supplemental materials (with some caveats). LOCALVIEW assessments should not always aim for external validity to the whole US local government population. Alternatively, LOCALVIEW can be useful in mixed-methods investigations (for example, as a database of easily accessible meeting videos for one particular city government of interest). This article’s database and technique should complement, not replace, smaller-n local politics scholars14,18,19.