Technical Description – Travel Through Data

Selecting our Sources

To build a foundation for our analysis of tourism recovery in China and the United States, we used both secondary literature and quantitative datasets. We identified relevant articles, reports, and policy summaries through UCLA Library’s online databases and credible international organizations. Using keywords such as “tourism recovery,” “COVID border policy,” “China reopening,” and “U.S. mobility restrictions,” we located academic studies, government publications, and professional analyses that helped us understand how travel behavior, public health conditions, and national responses shaped international mobility. We reviewed these sources, extracted the insights most relevant to our topic, and organized them by theme so that they could support the narrative of our project.

In addition to literature, we selected four datasets that capture different dimensions of tourism and pandemic conditions. The UNWTO Tourism Statistics provide annual indicators of arrivals, spending, and tourism’s economic contribution. The TTSA dataset offers a detailed view of the U.S. tourism economy at the industry level. The OxCGRT policy index documents changes in government restrictions, and the COVID-19 epidemiology dataset supplies information about case counts throughout the pandemic. These sources were chosen because they are produced by authoritative institutions and because together they allow us to study tourism recovery through the combined lenses of economic activity, policy environment, and public health trends. By integrating literature and data, we developed a more complete understanding of how international tourism was disrupted and how recovery unfolded in both countries.

Processing our Data

Our project combines quantitative datasets and secondary research to examine how international tourism in China and the United States recovered after the COVID-19 pandemic. We selected four datasets that capture different dimensions of this topic: UNWTO tourism indicators for arrivals, spending, and economic contribution; TTSA data showing how tourism activity is distributed across industries in the United States; OxCGRT policy data that reflects government responses; and Google Health COVID-19 Open Data Repository’s epidemiological data that helps contextualize changes in mobility. We also reviewed academic literature and credible reports to understand the broader relationship between public health conditions, border restrictions, and international travel. After gathering these materials, we organized the relevant insights so they could support the interpretive framework of our project.

To prepare the quantitative data for analysis, we cleaned, filtered, and standardized each dataset so that they could be compared on the same timeline.

The Google Health COVID-19 epidemiology data was largely pre-cleaned, so we focused preprocessing on modifications specifically tailored to our project. Namely, we filtered the location_key variable to include only locations relevant to the United States and China, selected variables relevant for analysis, aggregated daily data into monthly totals, and handled missing values as appropriate. These processing steps allowed us to reduce the dataset’s volume and ensured compatibility with our other data sources for visualization. It should be noted, however, that due to the dataset’s limitations, its scope is restricted to the period 2020–2022.

For the UNWTO and TTSA dataset, we limited the data to China and the United States, aligned all variables to the years 2018 through 2023, and reorganized the tables into consistent formats. From the UNWTO collection, which contains seven thematic categories, we selected only the Inbound and Macroeconomic files because these indicators directly reflect international tourism flows and tourism’s contribution to national economies. The other categories, such as Domestic, Outbound, Accommodation, Employment, and SDGs, were not included because they do not measure cross-border mobility or are not directly relevant to comparing recovery between China and the United States. Within TTSA, we focused on the tables that best capture tourism demand, industry output, employment levels, and real tourism activity. These were Tables 3, 4, 7, and 8, which together provide a clear picture of how the U.S. tourism economy contracted and recovered. The remaining TTSA tables were excluded because they center on supply-side commodity flows or categories that do not directly reflect international tourism performance.The annual datasets from the two were reformatted to follow a unified structure, while the daily policy and epidemiological datasets were aggregated into annual averages or totals to match the level of detail in the tourism indicators. We removed inconsistencies, simplified category labels, normalized units, and converted each dataset into a structure that could be visualized in Tableau. These steps allowed us to integrate multiple sources into a coherent foundation for identifying trends, comparing recovery trajectories, and understanding how policy environments shaped the return of international tourism.

The OxCGRT policy data required additional preprocessing to ensure consistency with the tourism indicators used in this analysis. The raw OxCGRT dataset is reported at a daily frequency and includes ordinal policy measures that vary across time and countries. To align these data with the annual and monthly tourism datasets, we aggregated daily policy indicators into monthly averages, which smooth short-term fluctuations while preserving changes in overall policy intensity.

It is important to note that while the Oxford COVID-19 Government Response Tracker also reports a composite stringency index measured on a continuous 0–100 scale, the OxCGRT policy variables analyzed in this project are ordinal measures rather than normalized indices. These indicators are coded on a 0–2 scale, where 0 represents no policy measures, 1 represents partial or recommended measures, and 2 represents strict or mandatory enforcement. Accordingly, visualizations based on these indicators display y-axis values extending to 2 to reflect the full range of enforcement intensity present in the data, rather than a standardized 0–1 scale.

Because values plotted represent averages over time, intermediate values may appear between whole numbers. These values do not indicate fractional policies but instead capture shifts in enforcement severity within a given month or year. While this preprocessing approach supports cross-country comparison and integration with tourism outcomes, the resulting figures should be interpreted as representations of relative policy intensity over time, not precise legal thresholds or enforcement uniformity.

Presenting our Data

After completing our research and processing our cleaned data, we compiled our work into a cohesive website hosted on UCLA’s Humspace WordPress platform through the Digital Humanities department. We aimed to create a user-friendly experience by combining outside research, our own visualizations, and a clear site structure. To support intuitive navigation and maintain a professional look, we selected a pre-built template that aligns with strong UI/UX principles.

Our design follows a balanced three-color palette chosen for clarity, readability, and consistency across all pages. White serves as the primary color, covering most of the background and providing an open, clean foundation. Deep blue functions as the secondary color, used in large blocks, featured sections, and areas requiring emphasis or visual weight. Teal-blue acts as the accent color, applied sparingly to interactive elements such as buttons and icons to guide user attention and highlight key actions. All data visualizations use this palette to maintain cohesion across multiple layers of information. To further enhance readability, titles and labels are presented in dark blue for strong contrast against lighter backgrounds. Customized HTML and CSS adjustments ensure smooth layout behavior, clear spacing, and an overall polished and consistent visual identity across the site.

For digital accessibility, we ensured that every image and visualization had supporting alternative text. The final website has high-contrast colors, a color-blind palette, and underwent multiple accessibility reviews such as tab checking and tab testing.

Lastly, although our visualizations rely primarily on line charts and bar charts, this was an intentional design choice informed by both our data structure and the communicative goals of the project. The four datasets we used are composed largely of time-series and categorical variables. These formats are most accurately and intuitively represented through line graphs, which show changes over time, and bar charts, which allow for clear comparisons across sectors or recovery levels. More complex visual formats would not have improved interpretability and might have obscured the trends we aimed to highlight. Prioritizing clarity, accessibility, and accurate communication of recovery trajectories, we selected these visualization types as the most effective and transparent way to present our findings to viewers.

View our Data Critique