Data Critique

Here is a breakdown of each of our 4 datasets and how they collectively illuminate how different sectors and governments responded to and recovered from the COVID-19 pandemic across economic, policy, fiscal, and health dimensions. 

Source Overview

The Travel & Tourism Satellite Accounts (TTSA) dataset is produced by the U.S. Bureau of Economic Analysis (BEA). TTSA integrates administrative records, industry surveys, GDP accounts, and labor statistics to estimate the economic contribution of tourism within the United States. The dataset provides detailed annual measurements of tourism-related output, employment, value added, and industry-level performance using standardized national accounting methods.

What the Data Can Reveal

TTSA enables us to determine which segments of the U.S. tourism industry were most impacted by COVID-19 and how their recovery paths varied. The dataset’s division of tourism into industries allows us to determine which industries, such as hotel, recreation, and air travel, saw the biggest losses and which recovered more quickly. Changes in value added and employment in the tourism industry demonstrate the pandemic’s wider economic effects beyond only foreign arrivals.

TTSA enables us to determine which segments of the U.S. tourism industry were most impacted by COVID-19 and how their recovery paths varied. The dataset’s division of tourism into industries allows us to determine which industries, such as hotel, recreation, and air travel, saw the biggest losses and which recovered more quickly. Changes in value added and employment in the tourism industry demonstrate the pandemic’s wider economic effects beyond only foreign arrivals.

The dataset also reveals how tourism’s contribution to the U.S. economy shifted during and after the pandemic. For example, a sharp decline in value added or tourism employment in 2020 reflects both the collapse of travel demand and the shutdowns triggered by lockdowns. The gradual recovery from 2021 to 2023 communicates how different tourism industries responded to reopening policies, vaccine availability, domestic travel surges, and changing consumer behavior.

Overall, TTSA allows us to connect macroeconomic patterns, such as recovery speed and sectoral resilience: to the broader narrative of U.S. tourism recovery.

What the Data Cannot Reveal

Despite offering comprehensive economic metrics, TTSA is unable to explain the reasons for shifts in employment or tourism output. Travel limitations, border closures, public health issues, governmental decisions, and traveler conduct are not specifically covered. For instance, TTSA will report a significant drop in 2020, but it is unable to determine the extent to which lockdowns, airline capacity reductions, customer anxiety, or unemployment rates contributed to that loss.

The dataset also does not explain who is traveling. It does not contain demographic information, traveler origin, purpose of visit, trip length, or spending per visitor. Without this context, TTSA cannot identify which market segments drove recovery or which remained suppressed.

Limitations

The most significant limitation is the complete absence of TTSA data for China. Despite searching English-language and Chinese-language government portals, academic repositories, and tourism bureau records, no publicly available satellite account exists. As a result, our analysis of tourism industry composition and sector-specific recovery is inherently one-sided and cannot be replicated for China.

The dataset is also annual, which means it cannot capture monthly or seasonal dynamics, even though tourism fluctuates around holidays, reopening stages, outbreaks, and border policies. TTSA relies on modeled estimates that may be less accurate during periods of extreme disruption (such as 2020–2021), when industry relationships and spending patterns changed rapidly. To combat this, we cleaned the datasets and blocked off each year by quarter to be able to read the visualization better.

Source Overview

The Oxford COVID-19 Government Response Tracker (OxCGRT) is produced by a research team at the University of Oxford. It compiles daily policy responses from governments around the world using publicly available announcements, press releases, news sources, and health authority reports. OxCGRT codes 20+ policy indicators such as school closures, gathering limits, border controls, and travel restrictions, and aggregates nine of these into the well-known Stringency Index, which measures the strictness of a country’s COVID-19 policy environment on a 0–100 scale.

What the Data Can Reveal

OxCGRT allows us to track how COVID-19 policies changed over time and how strict each government’s response was during key phases of the pandemic. The Stringency Index makes it possible to identify periods of lockdown, reopening, and re-tightening. This is especially useful for explaining tourism trends: U.S. international arrivals begin recovering earlier than China’s largely because national-level restrictions eased sooner in the United States, while China maintained strict border controls until early 2023.

Because the dataset is daily, quarterly averages reveal broader patterns such as the sustained high stringency in China from 2020–2022 and the more variable U.S. policy response. When we pair these policy shifts with tourism indicators, we can directly show how government decisions created the conditions that shaped tourism collapse and recovery. 

OxCGRT itself is daily, but because we aggregated it into quarterly averages to make it compatible with our tourism timelines, short-term policy dynamics are no longer visible.

What the Data Cannot Reveal

Despite its detail, the dataset cannot explain how policies were enforced or experienced on the ground. OxCGRT only records what measures governments announced, not: regional variation in enforcement, public compliance, differences in local government implementation, the social or economic motivations behind each policy, or how travelers responded to these policies.

 The dataset provides policy-level information only, so it cannot reveal any traveler characteristics such as nationality, purpose of visit, length of stay, or spending, which limits our ability to connect policy shifts to specific visitor groups or tourism behaviors.

Additionally, the dataset does not account for airline capacity, domestic travel patterns, border procedures (such as quarantine duration, visa requirements, PCR testing), or changes in passenger sentiment, all of which have a substantial impact on the recovery of tourism. Furthermore, OxCGRT gives a single national stringency number for China that is unable to account for the disparities in lockdown implementation between provinces. In the case of the United States, however, the impact of state-level restrictions, which differed significantly by region, cannot be well represented by the national index.

Limitations

The raw data also reveals issues such as missing values, inconsistent indicator coding, and shifting definitions across years, requiring substantial cleaning before meaningful analysis is possible. Because each sub-indicator is coded uniformly regardless of severity, scale, or enforcement rigor, the dataset cannot convey how differently these policies were implemented across regions or how impactful they actually were in practice.

One major limitation is the structure of the Stringency Index itself. It collapses nine different policy categories, ranging from school closures to international travel restrictions into a single number. This makes the dataset easy to compare across countries, but it also over-simplifies policy environments by treating all categories as equally influential. For tourism analysis, this is significant because only some of these categories (e.g., border controls) directly affect international travel.

Additionally, missing values, inconsistent indication coding, and changing definitions are present in the raw OxCGRT dataset, all of which necessitate extensive data cleaning prior to analysis. These problems are fixed by our cleaned quarterly dataset, but the temporal specificity of the initial daily data is lost in the process.

Lastly, OxCGRT only provides one national-level value per nation, which obscures significant internal heterogeneity. While the United States’ national score mitigates the stark variances between states, China’s centralized reporting obscures provincial differences. It is challenging to interpret stringency scores as direct counterparts between the two nations because of these discrepancies.

Source Overview

The UNWTO dataset is produced by the United Nations World Tourism Organization through collaboration with national tourism boards and government statistical agencies. The organization collects information using border control records, national visitor surveys, and official economic reporting systems.

For our project, we used four types of indicators from the UNWTO database. These include total inbound arrivals, total inbound tourism expenditure, tourism value added in billions of U.S. dollars, and total GDP. We cleaned and reorganized the dataset for both China and the United States from 2018 to 2023 so the two countries could be directly compared across a consistent timeline.

What the Data Can Reveal

The dataset shows how international tourism changed during the years surrounding COVID-19. Total inbound arrivals and expenditures help us identify the scale of the collapse in 2020 and the speed of the rebound after borders reopened. Because the indicators come from standardized sources, they allow year-to-year comparisons within each country.

The macroeconomic indicators reveal another layer of the recovery. Tourism value added shows how much tourism contributed to national economic output, while the GDP enabled calculation of percentage, showing the relative importance of tourism in each economy. These values help us distinguish between a rise in tourism activity and broader economic changes that may have influenced the tourism sector.

By comparing China and the United States, we can observe different recovery paths. China reopened its borders later, while the United States allowed international travel earlier. These policy differences appear clearly in both the tourism indicators and the macroeconomic measures.

What the Data Cannot Reveal

The dataset cannot explain the specific reasons behind increases or decreases in arrivals, spending, or value added. It does not include information about travel restrictions, visa rules, airline capacity, or domestic health policies.

The dataset also does not provide details about traveler groups. It does not include nationality, purpose of travel, length of stay, or per-tourist spending. Without this information, it is not possible to identify which types of visitors contributed most to the recovery.

The macroeconomic indicators cannot isolate the effect of tourism alone. A change in tourism value added may reflect shifts in the wider economy rather than changes in tourism activity. For example, if a country’s total GDP decreases during a recession, tourism may appear to have a larger share even if the sector did not grow.

Limitations

The dataset reports annual totals. Monthly or seasonal changes are not visible, even though tourism often fluctuates during holidays or major events.

The reporting methods differ across countries. Each national agency uses its own estimation models and survey techniques, which may affect cross-country comparisons. Some categories, such as tourism value added, rely on economic modeling. These models became less stable during the pandemic years because travel patterns changed quickly. The dataset also does not show inflation or purchasing power. Expenditure values may rise even when real economic activity has not changed.

There are gaps and revisions in several countries’ pandemic-year data, which introduces uncertainty into the timeline.

Source Overview

The Google Health COVID-19 Open Data Repository is a global, publicly accessible database that aggregates COVID-19–related information from more than 20,000 locations worldwide. It compiles data from government health authorities, international organizations, research institutions, and official public health reporting systems. The goal of the repository is to provide a standardized, centralized resource that helps researchers, public health professionals, journalists, and policymakers understand the spread and severity of COVID-19. 

For our project, we focused specifically on the epidemiology dataset, which contains daily, location-based records on confirmed cases, deaths, recoveries, and testing volumes from January 2020 through September 2022. Because our analysis compares two countries across tourism and policy datasets, we restricted our use of this dataset to the United States and China.

What the Data Can Reveal

The epidemiology dataset allows us to identify broad patterns in COVID-19 transmission and severity over time. Daily case and death counts show waves of outbreaks, periods of containment, and the timing of major surges. Because the data are high-frequency and tied to specific locations, they can be aligned with known policy changes, border restrictions, and tourism indicators drawn from our other datasets. 

In this sense, the Google dataset provides the epidemiological backdrop against which travel decisions, governmental restrictions, and tourism-sector performance unfolded. It establishes the baseline narrative of the pandemic’s progression that guides the rest of our analysis.

What the Data Cannot Reveal

The dataset reports numerical outcomes only, and as such, it cannot explain why cases rise or fall. For example, it is unable to capture differences in local testing availability, the emergence of new variants, behavioral changes, mobility patterns, or vaccination rollout and uptake. In addition, the epidemiology dataset contains no demographic detail. There is no information on age, ethnicity, socioeconomic status, occupation, or travel behavior of infected individuals. All numbers are population-level aggregates. Without this contextual information, we cannot identify which groups were disproportionately affected or how different communities experienced the pandemic.

Limitations

The most significant limitation of the dataset is the variation in national reporting systems. Countries measure and report COVID-19 differently, leading to substantial differences in testing availability, case definitions, reporting rules, and death certification standards. Because the repository standardizes these numbers without adjusting for these differences, cross-country comparisons must be interpreted with caution.

This issue is particularly important for our two countries: the United States and China.  For the United States, data may fluctuate due to the highly decentralized reporting system. States and counties report data independently, apply different testing strategies, and often experience reporting delays or backlogs, potentially producing inconsistencies or artificial spikes in the data.  On the other hand, for China, reported numbers appear smooth and consistent.  However, they may be incomplete due to strict testing protocols, centralized reporting, and concerns about transparency. Case counts remained extremely low even during periods when large outbreaks were reported anecdotally in several cities, suggesting the possibility of underreporting or political influence.

Another major limitation is the time coverage of the dataset. Because Google ended data collection in September 2022, some notable key events and possible trends are missing in this.  This includes China’s end of the Zero-COVID policy (December 2022), the reopening of China’s borders (January 2023), and the acceleration of global tourism recovery in 2023-2024.  Because these developments lie outside the dataset’s range, the epidemiological context for late-pandemic and post-pandemic tourism trends is incomplete.

What organizations funded the creation of these datasets?

The TTSA is funded by the U.S. Department of Commerce through the Bureau of Economic Analysis, supported by the National Travel and Tourism Office. OxCGRT is financed by the UK Department for International Development (DFID), UK Research and Innovation (UKRI), and the University of Oxford. While no public statement is made on the funding and maintenance of the UNWTO database, the Tourism Statistics Database is internally funded by UN Tourism — meaning its upkeep is supported by the agency’s assessed contributions from UN Member States and its voluntary or trust-fund contributions. The Google Open Data Repository is maintained by Google as part of its Open Data initiative, with contributions from global academic and public health institutions. Each dataset is thus sustained by large, well-funded institutions with a shared goal of producing standardized, global data for research and policy analysis.

What is our dataset’s ontology?

Taken together, the ontology of these datasets reflects a technocratic, quantitative view of crisis, one that privileges measurable indicators over human experience. TTSA defines tourism as economic performance; OxCGRT defines governance as codified action; UNWTO presents the tourism industry’s economy with financial numerical data; and Google defines health as case data. If these were our only sources, the pandemic would appear as a series of data points, charts, and recovery rates, a logistical problem of GDP loss, debt management, and infection control. What would be lost are the personal stories of adaptation, grief, inequality, and resilience that numbers cannot capture. These datasets show how institutions understand crises: not through emotion or ethics, but through the language of efficiency, comparability, and recovery.

In this sense, the Google dataset provides the epidemiological backdrop against which travel decisions, governmental restrictions, and tourism-sector performance unfolded. It establishes the baseline narrative of the pandemic’s progression that guides the rest of our analysis.