Factors predicting PhD affirmation and regret in PhD holders
Participant recruitment
The data for this study were drawn from the PhD Career Pathways Project, a large study of current PhD students and alumni from across the United States. The Council of Graduate Schools served as the coordinator body with 59 US member institutions taking part in the project. Graduate school deans from PhD granting institutions were recruited to disseminate the survey to their PhD alumni. The Council of Graduate Schools as well as each of the 59 participating institutions obtained Institutional Review Board approval to recruit participants. In conjunction with participating institutions, the alumni survey was aimed at ascertaining the career pathways and decisions of early career (3 years), mid-career (8 years), and advanced stage (15 years) alumni.
In total, we recruited 17,783 participants from 59 institutions across the United States. Of the institutions represented, 50 are classified as R1 (highest research activity), 7 institutions are R2 (high research activity), and the remaining two institutions fall into other Carnegie categories. The sampling of institutions coincides with the production of PhDs in the United States, which are largely at R1 institutions (National Center for Science and Engineering Statistics, 2023). All institutions were members of the Council of Graduate Schools and elected to take part in the study.
Data collection
Data for the present study were obtained through web-based Qualtrics surveys. The data were collected annually in Fall 2018 and 2019 when partners distributed the survey to alumni who earned their PhD three, eight, and fifteen years prior.
After removing cases with missing data on all key variables and individuals who were only three years beyond their PhD, the analytical sample yielded 10,970 alumni. Most participants were male (52.4%) and domestic alumni (85.4%). The largest number of alumni came from the biological and health sciences (20.2%). A majority (65.3%) were currently working in academia and 75.8% of respondents noted that they were working in a field that was closely related to their PhD. The mean salary was $108,884.
Measures
The following measures were collected.
Affirmation of one’s decision to pursue a PhD
The primary dependent variables were to measure alumni’s level of likelihood, if they had to start again, to: 1) pursue a PhD in general, 2) pursue a PhD in the same field, and 3) pursue a PhD at the same institution. The responses were measured on a 5-point Likert scale (1- Definitely Would Not, 5- Definitely Would).
Ethnoracial identity
Alumni indicated their ethnoracial identity by selecting one or more of the following: American Indian/Alaska Native, Hawaiian/Pacific Islander, Asian, Black/African American, White, or Hispanic. A Multiracial category was created for any individual who reported more than one race. Ethnoracial identity was dummy-coded with White as the reference group.
Gender
Alumni indicated whether they identified as Male, Female, Gender non-binary, and Another gender not listed. Gender non-binary and Another gender not listed each made up less than 0.5 percent and were excluded from the analyses.
Citizenship
Alumni indicated whether they were a U.S. citizen, permanent U.S. resident, or temporary resident (non-US citizen). These categories were dichotomized by combining the permanent U.S. resident and temporary resident categories (U.S. citizen/resident = 0, international = 1).
Perception of PhD programme’s preparation for current job
A primary independent variable was the respondents’ perceptions of their PhD programme preparation for their current employment role. The responses were measured on a 5-point Likert scale (1- Very Poorly, 5-Extremely Well).
Field of study
Field of PhD studied were dummy coded into nine groups: Arts & Humanities, Biology & Health Sciences, Business, Education, Mathematics & Computer Science, Engineering, Physical Sciences, Social Sciences, and Other Fields. Engineering was selected as the reference group.
Job sector
Participants selected from a series of eight sectors that best describe their employer. All academic institutions were collapsed into one category for academia which served as the reference group. Other categories include: government, non-profit, and industry.
Job relatedness
Respondents selected from three options (Closely Related, Somewhat Related, Not at All Related) on how related their current job to their PhD study. The reference category was Closely Related.
Salary
Salary information was initially coded into 12 categories. The categories were then transformed into midpoint dollar values of the first 11 categories to approximate a continuous measure. The midpoint of the open-ended final category ($150,000 or more) was then estimated to be $210,035 using the Pareto approximation technique (Parker & Fenwick, 1983; Wolniak et al., 2008). This technique has been shown to offer better estimates for top-coded earnings information in survey data over other estimation techniques such as a fixed multiple above the highest coded data point (Armour et al., 2016).
Years since PhD
Alumni indicated the year they earned their PhD and were grouped according to their place in the Early Career (3 years), Mid-Career (8 years), or Advanced Stage (15 years) group. The reference category was Early Career.
Statistical analyses
In addition to descriptive statistics of the overall sample, we tested an ordinal regression model to measure the relationships of the key independent variables using Stata 14. A major assumption of most ordinal logistic regression models is proportional odds (test of parallel lines) that is often not met in research studies (Cohen et al., 2003; Liu & Koirala, 2012; Williams, 2016). In the present study, the test of parallel lines was statistically significant and hence the assumption violated. When this key assumption is violated, the researcher may elect to use a multinomial logistic regression, treating the outcomes as categorical but with the loss of parsimony and interpretive power (Williams, 2016). Another option is the heterogenous choice model that provides appropriate estimations while relaxing the assumption only for parameters that violate the proportional odds assumption (Williams, 2010). Unlike other alternatives such as the generalized ordered logit model (-gologit2- command in Stata), the heterogenous choice model using the -oglm- command in Stata provides a single parameter estimate for each covariate rather than four estimates for any parameter that violates the proportional odds assumption, making interpretation easier to understand in theoretical and practical settings while not sacrificing model fit (Williams, 2010). Sensitivity analyses included performing the regression with the -gologit2-, -ologit-, and -oprobit- commands in STATA to ensure that the results were not dependent on specific assumptions. Additional sensitivity analyses were performed with and without several key variables to ensure that results were robust in terms of model specification.