$25
Importance of intergroup contact in settings like neighborhoods, public transportation, and sports leagues. All of these studies ensured or inferred that intergroup contact actually took place. However, some researchers have studied the possibility that even imagined contact with a member of an outgroup can change people’s perception of that group. In this Data Exploration assignment we will explore two datasets derived from imagined intergroup contact experiments. In part one, you will look at data from a recent study conducted by Dong Wang, Iain Johnston (a professor in the Harvard Government Department) and Baoyu Wang. Wang et al. (2021) conducted an experiment on a group of Chinese students to determine if imagined social contact could reduce antipathy toward Japanese people. In part two, you will look at the results of our in-class survey, which tested whether imagined social contact with a member of one’s less-preferred political party would change attitudes toward members of that party. You can do either part first, but you will probably find the exercise most valuable if you do some of each part.
If you have a question about any part of this assignment, please ask! Note that the actionable part of each question is bolded.
Part One: Chinese Students and Perception of Japanese People
Data Details:
• File Name: ChinaJapanData.csv
• Source: These data are from (Wang et al. (2021))[https://drive.google.com/open?id=111pbDphCslbMXbmPBwKNQeQ authuser=renos%40g.harvard.edu&usp=drive_fs]. Please take some time to skim this paper in order to get a feel for the population they studied, their key hypotheses, and their experimental procedure. Subjects were asked to imagine a bus ride, either one in which they talked to a Japanese person (treatment) or just enjoyed the scenery (control). They were then asked a series of questions to assess their affective feelings toward Japanese and Chinese people, their perceptions of the characteristics of Japanese and Chinese identity, and demographic, policy, and pschological questions to serve as control variables.
Variable Name
Variable Description
subject
Anonymized identifier for each experimental subject
treated
Binary variable equal to TRUE if the subject was told to imagine a bus ride with a Japanese person (the treatment) and FALSE if the subject was told to imagine the scenery on a bus ride (control)
JapanPos
Affective feeling about Japanese people ranging from 1 (negative) to 7 (positive)
JapanWarm
Affective feeling about Japanese people ranging from 1 (cool) to 7 (warm)
Variable Name
Variable Description
JapanAdmire
Affective feeling about Japanese people ranging from 1 (loathing) to 7 (admiration)
JapanRespect
Affective feeling about Japanese people ranging from 1 (contempt) to 7 (respect)
ChinaPos
Affective feeling about Chinese people ranging from 1 (negative) to 7 (positive)
ChinaWarm
Affective feeling about Chinese people ranging from 1 (cool) to 7 (warm)
ChinaAdmire
Affective feeling about Chinese people ranging from 1 (loathing) to 7 (admiration)
ChinaRespect
Affective feeling about Chinese people ranging from 1 (contempt) to 7 (respect)
PosDiff
Difference between the Chinese and Japanese positivity score
WarmDiff
Difference between the Chinese and Japanese warmth score
AdmireDiff
Difference between the Chinese and Japanese admiration score
RespectDiff
Difference between the Chinese and Japanese respect score
JapanID_avg
Average of 30 ratings of Japanese people on identity trait pairs, coded from 1 to 7 where higher numbers are less favorable; see p. 12 of Wang et al. (2021) for details
ChinaID_avg
Average of the same 30 identity ratings of Chinese people
ID_diff_avg
Difference between ChinaID_avg and JapanID_avg
age
Age in years
gender
Gender, coded 1 for male and 0 for female
jpfriend
Indicator variable for if subject has a Japanese friend (1) or does not (0)
MediaInd
Attitude toward media independence from the government ranging from 1 (strongly oppose) to 5 (strongly support)
freetrade
Indicator variable for if subject supports free trade (1) or does not
(0)
school_major
Categorical variable denoting major in school; 1 = social sciences, 2 = humanities, 3 = sciences and engineering, 4 = law
PrejControl
Motivation to Control Prejudice index; an average of 17 items rated from 1 to 7 in which higher scores denote a greater motivation to control the expression of prejudice
# load the data ChinaJapan <- read_csv('ChinaJapanData.csv')
Question 1
Part a
When surveys ask a number of questions to try and measure the same underlying concept, it is common to make a summary index by taking the average of all these items. Create new variables for the average affective feeling toward Japanese people, the average affective feeling toward Chinese people, and the average difference between the two.
ChinaJapan <- ChinaJapan %>% mutate(avg_aff_Japanese = (JapanPos + JapanWarm + JapanAdmire + JapanRespect)/4, avg_aff_Chinese = (ChinaPos + ChinaWarm + ChinaAdmire + ChinaRespect)/4) %>%
mutate(avg_diff_aff = avg_aff_Chinese - avg_aff_Japanese)
Part b
For at least one of the individual affect items and all three affect averages you created in part a (China, Japan, and the difference between them), report mean values for the treatment and control groups, an estimate of the difference between those groups, and the results of a test for statistical significance. Did imagined social contact change subjects’ affect toward Japanese people? Chinese people? What about their affective polarization?
ChinaJapan %>% group_by(treated) %>%
summarize(mean(avg_aff_Chinese), mean(avg_aff_Japanese), mean(avg_diff_aff), mean(PosDiff))
## ‘summarise()‘ ungrouping output (override with ‘.groups‘ argument)
## # A tibble: 2 x 5
## treated ‘mean(avg_aff_Chi~ ‘mean(avg_aff_Jap~ ‘mean(avg_diff_~ ‘mean(PosDiff)‘
## <lgl> <dbl> <dbl> <dbl> <dbl>
## 1 FALSE 5.00 3.72 1.28 1.18 ## 2 TRUE 4.89 4.25 0.638 0.417
t.test(ChinaJapan$avg_diff_aff, ChinaJapan$treated)
##
## Welch Two Sample t-test
##
## data: ChinaJapan$avg_diff_aff and ChinaJapan$treated
## t = 3.0069, df = 142.43, p-value = 0.003121
## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 0.1570171 0.7596496 ## sample estimates:
## mean of x mean of y
## 0.9583333 0.5000000
Part c
Researchers often present the size of an experimental effect in terms of Cohen’s D, which calculates the ratio of the treatment effect to the standard deviation. This is a way to understand if a treatment effect is substantively large, in addition to statistically significant. A common “rule of thumb” is that a Cohen’s D score of 0.2 is small, 0.5 is medium, and 0.8 is large. Wang et al. (2021) present Cohen’s D scores in the tables throughout their article.
Here is a useful interactive visualization of Cohen’s D: https://rpsychologist.com/cohend/
Using the cohen.d() function, for each of the variables you used in Part b above, calculate a Cohen’s D and interpret whether it is small, medium, or large. Do these “rule of thumb” interpretations match your intuitive interpretation? Are any of the differences of means stastically significant, yet substantively small according to the Cohen’s D score?
Question 2
Part a
Wang et al. (2021) also investigate whether imagined social contact changes Chinese students’ perception of the characteristics associated with Japanese identiy, as well as the difference in perception of the semantic content associated with Chinese and Japanese identity. They use 30 pairs of opposite phrases (like “frank/hypocritical,” “civilized/barbaric,” and “peace-loving/belligerent”) on scales from 1 to 7 to measure these identity traits (higher scores are less favorable). We’ve provided you with the averages of these thirty items for perception of Chinese and Japanese identites, as well as the average difference between them.
For all three of these indices, report average values for the treatment and control groups, an estimate of the difference between those groups, and the results of a test for statistical significance. How do the experimental effects of imagined contact on identity traits compare to the effects on affect?
Part b
A useful way to visualize these effects of the experimental treatment is with a treatment effects plot. Below, we provide some sample code for how to make a treatment effects plot based on the t-tests you conducted
in part a of this question. Please work through the code to make sure you understand it and interpret the findings depicted in the resulting plot. See p. 2 of Mousa (2020) from this week’s readings for a good example of plots like this.
) # for this v
# First, store the t-tests as new objects in R
JapanIDdiff <- difference_in_means(JapanID_avg ~ treated, data = ChinaJapan) ChinaIDdiff <- difference_in_means(ChinaID_avg ~ treated, data = ChinaJapan)
Diff_in_ID_diff <- difference_in_means(ID_diff_avg ~ treated, data = ChinaJapan)
# These objects are called lists; they store values like point estimates and confidence intervals of sta
# Next, we extract information from these lists and save them as vectors that you will use to make the d outcomes <- c('Japanese Identity Rating', 'Chinese Identity Rating', 'Identity Difference' pointests <- c(JapanIDdiff$coefficients, ChinaIDdiff$coefficients, Diff_in_ID_diff$coefficients) lowbounds <- c(JapanIDdiff$conf.low, ChinaIDdiff$conf.low, Diff_in_ID_diff$conf.low) upbounds <- c(JapanIDdiff$conf.high, ChinaIDdiff$conf.high, Diff_in_ID_diff$conf.high)
# Combine the vectors into a data frame
treatment_effect_plot_data <- tibble(outcomes,pointests,lowbounds,upbounds)
# Make the plot
ggplot(treatment_effect_plot_data, mapping=aes(x=factor(outcomes, levels = outcomes), y=
pointests, ymin=
# Why did we use the factor() function for defining the x-axis? If you take the factor() function with t
Part c
Make your own treatment effects plot to visualize the effect of treatment on all four items associated with affect toward Japanese people and the index averaging those items. Interpret your results.
Question 3: Data Science Question
Part a
Pick one of the two average difference variables (AffectDiff_avg or ID_diff_avg) to use as a dependent variable for this question. Then pick at least two of the control variables and write hypotheses about how they would affect your chosen dependent variable. Do this before you do part b of this question.
ANSWER: For my dependent variable, I’ll use the average overall difference in affect toward Chinese and Japanese people. For the first independent variable, I’ll use freetrade; I would posit that this would negatively correlate with the dependent variable, meaning those who are pro-free trade would be more like to see Japanese and Chinese people on the same standing. For the second independent variable, I’ll use school_major, though I’ll modify it such that 1 corresponds with “studied humanities or social sciences” and 0 indicates that they studied something else. I would expect the same relationship as with free trade, given these majors are correlated with lower SDO.
Part b
Use multiple regression to test your hypotheses. Be sure to include your selected control variables and the treatment variable at a minimum, bu the exact form of the model is up to you. Report and interpret the regression coefficients in the context of your hypotheses.
mod = lm(avg_diff_aff ~ freetrade + school_major + treated, data = ChinaJapan %>% mutate(school_major = ifelse(school_major %in% c(1, 2), 1,
stargazer(mod, type = "text")
0)))
##
## ===============================================
## Dependent variable:
## ---------------------------
## avg_diff_aff
## -----------------------------------------------
## freetrade -0.179
## ##
(0.260)
## school_major
-0.145
## ##
(0.292)
## treated
-0.624**
## ##
(0.289)
## Constant
1.511***
##
(0.339)
##
## -----------------------------------------------
## Observations 120
## R2 0.047
## Adjusted R2 0.023
## Residual Std. Error 1.574 (df = 116)
## F Statistic 1.920 (df = 3; 116)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
mod = lm(avg_diff_aff ~ gender + jpfriend + treated, data = ChinaJapan) stargazer(mod, type = "text")
##
## ===============================================
## Dependent variable:
## ---------------------------
## avg_diff_aff
## -----------------------------------------------
## gender -0.051
## (0.288)
##
## jpfriend 0.188
## (0.298)
##
## treated
-0.649**
## ##
(0.288)
## Constant
1.239***
##
(0.273)
##
## -----------------------------------------------
## Observations 120
## R2 0.045
## Adjusted R2 0.020
## Residual Std. Error 1.577 (df = 116)
## F Statistic 1.803 (df = 3; 116)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
ANSWER: The treatment variable for my first model had a coefficient of -0.624 and was statistically significant at the p < 0.05 level, meaning we can be 95% confident that the treatment leads to a 0.624 reduction in the average affect difference between the in- and outgroup. The other variables I chose, freetrade and the modified school_major, were indeed negatively correlated but not statistically significant. Interesting, I tried variables that might be more significantly correlated, like having a Japanese friend, jpfriend, and gender, but these were also not significant. It seems that only the treatment variable had an impact.
Part c
Coefficient plots can be a good way to display main results from a regression in a way that is visually easier to interpret than a table. They are very similar to the treatment effect plots we made in the previous question.
Make a coefficient plot with the point estimates and 95% confidence intervals for all of your regression coefficients other than the intercept. Be sure to include some kind of line at zero to aid interpretation of statistical significance. Take a look at p. 5 of Brown et al. (2021) from this week’s reading for an example of a coefficient plot (although note that that is plotting one regression coefficient across multiple regression specifications; we’re asking you to plot multiple regression coefficients from one regression model).
mod = lm(avg_diff_aff ~ freetrade + school_major + treated, data = ChinaJapan %>% mutate(school_major = ifelse(school_major %in% c(1, 2), 1,
dwplot(mod, vline = geom_vline(
xintercept = 0, colour = "grey60",
linetype = 2
))
0)))
ANSWER: I used the dotwhisker package to do this.
Part Two: Harvard Students and Perception of Members of the Other Party
Data Details:
• Source: These data are from the Qualtrics survey you all took last week. About half of you were in the control condition and were simply asked to imagine a bus ride with beautiful scenery; the other half were in the treatment condition and were asked to imagine a bus ride next to a member of your nonpreferred US political party. All students were then asked a series of questions to assess their affective feelings toward Democrats and Republicans and their perceptions of the characteristics of Democratic and Republican identity. Additional control variables were merged from the class background survey.
Variable Name
Variable Description
Treated
Binary variable equal to TRUE if the subject was told to imagine a bus ride with a member of the opposing political party (the treatment) and FALSE if the subject was told to imagine the scenery on a bus ride (control)
ClosestParty
Which of the two major US political parties the subject feels closes to
Variable Name
Variable Description
strongPARTISAN
Binary variable coded as TRUE if subject self-identifies as a strong partisan and FALSE otherwise
ControlScenario_1
First text-based reflection of the imagined bus ride for those in the control condition
ControlScenario_2
Second text-based reflection of the imagined bus ride for those in the control condition
ControlScenario_3
Third text-based reflection of the imagined bus ride for those in the control condition
TreatmentScenario_1
First text-based reflection of the imagined bus ride for those in the treatment condition
TreatmentScenario_2
Second text-based reflection of the imagined bus ride for those in the treatment condition
TreatmentScenario_3
Third text-based reflection of the imagined bus ride for those in the treatment condition
RepublicanAffect_1
Affective feeling about Republicans ranging from 1 (negative) to 7 (positive)
RepublicanAffect_2
Affective feeling about Republicans ranging from 1 (cool) to 7 (warm)
RepublicanAffect_3
Affective feeling about Republicans ranging from 1 (loathing) to 7 (admiration)
RepublicanAffect_4
Affective feeling about Republicans ranging from 1 (contempt) to 7 (respect)
DemocraticAffect_1
Affective feeling about Democrats ranging from 1 (negative) to 7
(positive)
DemocraticAffect_2
Affective feeling about Democrats ranging from 1 (cool) to 7 (warm)
DemocraticAffect_3
Affective feeling about Democrats ranging from 1 (loathing) to 7 (admiration)
DemocraticAffect_4
Affective feeling about Democrats ranging from 1 (contempt) to 7 (respect)
RepublicanIdentity_1
Rating of Republican identity trait from 1 (obstinate) to 7 (open-minded) (note that identity favorability is associated with the higher number, unlike in the data from Part One)
RepublicanIdentity_2
Rating of Republican identity trait from 1 (evil) to 7 (moral)
RepublicanIdentity_3
Rating of Republican identity trait from 1 (arrogant) to 7 (humble)
RepublicanIdentity_4
Rating of Republican identity trait from 1 (cruel) to 7 (kind)
DemocraticIdentity_1
Rating of Democratic identity trait from 1 (obstinate) to 7 (open-minded)
DemocraticIdentity_2
Rating of Democratic identity trait from 1 (evil) to 7 (moral)
DemocraticIdentity_3
Rating of Democratic identity trait from 1 (arrogant) to 7 (humble)
DemocraticIdentity_4
Rating of Democratic identity trait from 1 (cruel) to 7 (kind)
gender
Character variable reflecting self-identified gender
college_stats
Binary variable coded as TRUE if subject self-identifies as having taken college-level statistics and FALSE otherwise
year
Year in college from 1 to 4
US
Binary variable coded as TRUE if subject self-identifies as having been born in the United States and FALSE otherwise
InPartyAffect_(1-4)
Affective feelings about the in-party using the same numbering scheme as above
InPartyAffect_avg
Average affective feelings about the in-party
Variable Name
Variable Description
OutPartyAffect_(1-4)
Affective feelings about the out-party using the same numbering scheme as above
OutPartyAffect_avg
Average affective feelings about the out-party
InPartyIdentity_(1-4)
Identity ratings about the in-party using the same numbering scheme as above
InPartyIdentity_avg
Average identity ratings about the in-party
OutPartyIdentity_(1-4)
Identity ratings about the out-party using the same numbering scheme as above
OutPartyAffect_avg
Average identity ratings about the out-party
AffectDiff_(1-4)
Difference in affective feelings between in-party and out-party using the same numbering scheme as above
AffectDiff_avg
Average difference in affective feelings between in-party and out-party
IdentityDiff_(1-4)
Difference in identity ratings between in-party and out-party using the same numbering scheme as above
IdentityDiff_avg
Average difference in identity ratings between in-party and out-party
# load the data
ClassExperiment <- read_csv('Oct28ClassData.csv')
Question 4
Part a
For at least one of the individual affect items, the in-party and out-party affect averages, and the average affect difference, report average values for the treatment and control groups, an estimate of the difference between those groups, and the results of a test for statistical significance. Did imagined social contact change your classmates’ affect toward members of the opposing party? Their own party? What about their affective polarization?
ClassExperiment <- ClassExperiment %>% mutate(avg_aff_in_party = ifelse(ClosestParty == "the Republican party",
(RepublicanAffect_1 + RepublicanAffect_2 +
RepublicanAffect_3 + RepublicanAffect_4)/4,
(DemocraticAffect_1 + DemocraticAffect_2 +
DemocraticAffect_3 + DemocraticAffect_4)/4),
avg_aff_out_party = ifelse(ClosestParty == "the Republican party",
(DemocraticAffect_1 + DemocraticAffect_2 +
DemocraticAffect_3 + DemocraticAffect_4)/4,
(RepublicanAffect_1 + RepublicanAffect_2 + RepublicanAffect_3 + RepublicanAffect_4)/4)) %>%
filter(!is.na(avg_aff_in_party), !is.na(avg_aff_out_party)) %>% mutate(avg_diff_aff = avg_aff_in_party - avg_aff_out_party)
ClassExperiment %>% group_by(Treated) %>%
summarize(mean(avg_aff_in_party), mean(avg_aff_out_party), mean(avg_diff_aff))
## # A tibble: 2 x 4
## Treated ‘mean(avg_aff_in_party)‘ ‘mean(avg_aff_out_party)‘ ‘mean(avg_diff_aff~
## <lgl> <dbl> <dbl> <dbl> ## 1 FALSE 5.05 3.11 1.94
## 2 TRUE 4.84 2.98 1.86
t.test(ClassExperiment$avg_diff_aff, ClassExperiment$Treated)
##
## Welch Two Sample t-test
##
## data: ClassExperiment$avg_diff_aff and ClassExperiment$Treated
## t = 6.4763, df = 83.945, p-value = 6.041e-09
## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## 0.9516018 1.7949735 ## sample estimates:
## mean of x mean of y
## 1.8938356 0.5205479
Part b
For at least one individual identity trait, the in-party and out-party identity averages, and the average identity difference, report mean values for the treatment and control groups, an estimate of the difference between those groups, and the results of a test for statistical significance. How do the experimental effects of imagined contact on identity traits compare to the effects on affect in the class sample?
Question 5
Compare the results from the class experiment to the results from Wang et al. (2021). What do you hypothesize accounts for similarities or differences in the results?
Question 6: Data Science Question
We have not yet asked you to use the free response data, which takes the form of unstructured text. With this question, we challenge you to find a creative way to use this data. To structure your work, first suggest a hypothesis that could be investigated using the text data and the other data from the experiment. Second, implement a method to use the text data to test this hypothesis. Your method can involve automated or manual processing of the text. You might consider using a function like nchar to characterize the length of response or a package like (stm)[https://cran.rproject.org/web/packages/stm/index.html] to do more sophisticated content analysis.
library(tidytext) library(janeaustenr) austen_books()
## # A tibble: 73,422 x 2
## text book
## * <chr> <fct>
## 1 "SENSE AND SENSIBILITY" Sense & Sensibility
## 2 "" Sense & Sensibility
## 3 "by Jane Austen" Sense & Sensibility
## 4 "" Sense & Sensibility
## 5 "(1811)" Sense & Sensibility ## 6 "" Sense & Sensibility ## 7 "" Sense & Sensibility ## 8 "" Sense & Sensibility ## 9 "" Sense & Sensibility
## 10 "CHAPTER 1" Sense & Sensibility
## # ... with 73,412 more rows
nrc_positive <- get_sentiments("nrc") %>% filter(sentiment == "positive") %>% mutate(positive = 1)
nrc_negative <- get_sentiments("nrc") %>% filter(sentiment == "negative") %>% mutate(positive = 0)
sentiments <- nrc_positive %>% rbind(nrc_negative)
ClassExperiment.sentiments <- ClassExperiment %>% mutate(participant = row_number()) %>%
unite("text", ControlScenario_1:`TreatmentScenario _3`, sep = " ", na.rm = TRUE) %>% unnest_tokens(word, text) %>% inner_join(sentiments, by = "word") %>% group_by(participant) %>% summarize(treated = mean(Treated), avg_aff_in_party = mean(avg_aff_in_party), avg_aff_out_party = mean(avg_aff_out_party), avg_diff_aff = mean(avg_diff_aff), avg_sentiment = mean(positive)) %>%
ungroup()
mod = lm(avg_diff_aff ~ avg_sentiment + treated, data = ClassExperiment.sentiments) stargazer(mod, type = "text")
##
## ===============================================
## Dependent variable:
## ---------------------------
## avg_diff_aff
## -----------------------------------------------
## avg_sentiment 0.816
## ##
(0.742)
## treated
-0.245
## ##
(0.576)
## Constant
1.395**
## ##
(0.542)
## -----------------------------------------------
## Observations 47
## R2 0.027 ## Adjusted R2 -0.017
## Residual Std. Error 1.844 (df = 44)
## F Statistic 0.605 (df = 2; 44)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
ANSWER: I decided to conduct sentiment analysis on the responses. First, I combined the responses into one cell and split them by word, creating a large table. I then joined the table to one I created with all positive and negative words from the nrc index. This obviously eliminated plenty of words and participants, but I continued. Then I summarized average sentiment by participant, keeping the treated and average difference variables around. Finally, I conducted a regression using the treatment variable and average sentiment, though both were insignificant, likely due to the fact that in the process of analyzing the text, I eliminated many participants as the prompt didn’t necessarily call for using descriptive phrases and adjectives.
Question 7
Can you glean any additional insights by using the control variables included with the class experiment data? For example, are the results different if you subset to strong partisans or people born in the United States? Be creative.