# Homework

INSTRUCTIONS FOR ASSIGNMENT

FOR SPSS:

Go to: Analyze>Compare Means>One-Way ANOVA

To include the Turkey test, click on Post-Hoc from the dialog box that appears when you go to ANOVA, after you enter the variables (one variable and one factor), click on “Turkey,” click “continue,” then OK

For PSPP

Read the PSPP Chapter 6 for step by step help (Mac Only).

For windows PSPP 1.53

Go to: Analyze>Compare Means>One-Way ANOVA

To include the Turkey test, click on Post-Hoc from the dialog box that appears when you go to ANOVA, after you enter the variables (one variable and one factor), click on “Turkey,” click “continue,” then OK

Test whether there is an association between a person’s educational attainment and how much television they watch. Use the GSS2008 data set to perform an ANOVA on respondents’ highest educational degree (DEGREE) and the hours per day they watch television (TVHOURS)

FOR PSPP ON MAC ONLY: Now perform the analysis with PSPP [USE THE SYNTAX BELOW TO DO THE ENTIRE ANALYSIS] and find:

WE WILL USE THIS TO DO THE FOLLOWING QUESTIONS: One-way ANOVA

File

New

Syntax

The syntax editor will open, then type [do not type in BRACKETS]

Oneway [enter]
/Variables = var1 BY var2 [enter]
/Statistics = descriptives [enter]
/Posthoc = tukey [go to tool ribbon “File Edit Run Windows Help” and click on RUN and then ALL. Check you output and you should have a nice TABLE. Var1 and var2 are the variables being used for the tables. Just type in the variable name as written in the assignment.

Do not type Var1 and Var2 as they do not stand for anything-and you will get an error message.

Hint on the above Tvhours (dependent variable) BY degree (factor)

NOTE CONCERNING SPSS/PSPP DISCUSSION AND ASSIGNMENT EXERCISES: For anyone wondering, I have borrowed most of the exercises from various textbooks that I have used to teach this course and similar courses over the years with some modification when and if necessary AND these questions often challenge the perceptions of students.  Part of learning to analyze and interpret data is to set aside ones own biases and perceptions when examining the results.

Read the PSPP ANOVA for step by step help.

1)  Astrologers assert that our birth dates influence our success (or lack thereof) in life.  Test this assumption with the GSS2018 data by analyzing the relationship between respondent’s astrological sign (ZODIAK) with respondent’s socioeconomic index (SEI10).  The SEI10 is an indicator of economic and socio-economic attainment: The higher the SEI score, the more successful the respondent.

Make a prediction:

Can a person’s astrological sign predict their socioeconomic success?  Yes       No

If so, what zodiac sign is the most successful                                             ____________

Now perform the analysis using the GSS2018 data

Mean SEI10 of Pisces                                                                                        ______________

Mean SEI10 of Taurus                                                                                      ______________

Mean SEI10 of your astrological sign                                                              ______________

ANOVA significance level                                                                              ______________

Is the relationship statistically significant?                                                Yes      No

On the basis of these data, would you say that astrologers are correct in their assertion of the power of the stars?  Remember to explain this relationship based on what the test measures.   Why?

FOR THE REMAINING EXERCISES SEE ABOVE FOR THE SYNTAX SEQUENCE (IN THE WALK THROUGH) BUT MAKE SURE YOU HAVE THE CORRECT VARIABLE NAMES AND IN THE PROPER ORDER

2)  Test whether there is an association between a person’s educational attainment and how much television they watch.  Use the GSS2018 data set to perform an ANOVA on respondents’ highest educational degree (DEGREE) and the hours per day they watch television (TVHOURS)–Compare with 2008-are the results the same? If not, how where they different?

Make a prediction:

Predict the number of hours each group watches television in a given day:

Less than                    High School                Some               College            Graduate

__________              __________               __________           __________           __________

Now perform the analysis with PSPP and find:

Mean hours for people with less than a high school degree       ___________________

Mean hours for people with a high school degree                      ___________________

Mean hours for people with a junior college degree                    ___________________

Mean hours for people with a bachelor’s degree                         ___________________

Mean hours for people with a graduate degree                           ___________________

ANOVA significance level                                                          ___________________

Is the relationship statistically significant                                      Yes                  No

According to the Tukey Test, which categories have means that are significantly different from the means of those with a high school degree?

3)  Test whether there is an association between a person’s race and the prestige of their occupation.  Use the GSS2018 data set to perform an ANOVA on RACE and PRESTG10.  You are to do a tukey test at the end of this one as well so it is easier to just run the syntax like you did for exercise 2 and just change the relevant variables.

Mean prestige score for Whites                                                      ___________________

Mean prestige score for African Americans                                   ___________________

Mean prestige score for other races                                               ___________________

ANOVA significance level                                                                  ___________________

Is the relationship statistically significant                                      Yes                  No

According to the Tukey Test, which categories have means that are significantly different from the means of Whites?

On the basis of these data, would you say that race is associated with occupational prestige?  Remember to explain this relationship based on what the test measures.  What could explain this relationship?

4)  Test whether there is an association between a person’s degree and the years at current job.  Use the GSS2018 data sets to perform an ANOVA on DEGREE and YEARSJOB.  You are to do a tukey test at the end of this one as well so it is easier to just run the syntax like you did for exercise 2 and just change the relevant variables..

Make a prediction:

Predict the mean number of years at current job for each group:

Less than                    High School                Some               College            Graduate

__________              __________               __________           __________           __________

Now perform the analysis with PSPP and find:

Mean years for people with less than a high school degree       ___________________

Mean years for people with a high school degree                      ___________________

Mean years for people with a junior college degree                    ___________________

Mean years for people with a bachelor’s degree                         ___________________

Mean years for people with a graduate degree                           ___________________

ANOVA significance level                                                          ___________________

Is the relationship statistically significant                                      Yes                  No

According to the Tukey Test, which categories have means that are significantly different from the means of those with a high school degree?

5) Test whether there is an association between a person’s race and the HOURS WORKED LAST WEEK (HRS1).  Use the GSS2018 data set to perform an ANOVA on RACE and HRS1.  You are to do a tukey test at the end of this one as well so it is easier to just run the syntax like you did for exercise 2 and just change the relevant variables..

Mean HOURS WORKED for Whites                                                      ___________________

Mean HOURS WORKED for African Americans                                   ___________________

Mean HOURS WORKED for other races                                               ___________________

ANOVA significance level                                                                  ___________________

Is the relationship statistically significant                                      Yes                  No

According to the Tukey Test, which categories have means that are significantly different from the means of Whites?

On the basis of these data, would you say that race is associated with hours worked last week?  Remember to explain this relationship based on what the test measures.  What could explain this relationship?

6)  Test whether there is an association between a person’s race and the YEARS AT CURRENT JOB.  Use the GSS2018 data set to perform an ANOVA on RACE and YEARSJOB.  You are to do a tukey test at the end of this one as well so it is easier to just run the syntax like you did for exercise 2 and just change the relevant variables..

Mean years for Whites                                                      ___________________

Mean years for African Americans                                   ___________________

Mean years for other races                                               ___________________

ANOVA significance level                                                                  ___________________

Is the relationship statistically significant                                      Yes                  No

According to the Tukey Test, which categories have means that are significantly different from the means of Whites?

On the basis of these data, would you say that race is associated with years at a current job?  Remember to explain this relationship based on what the test measures.  What could explain this relationship?

Chapter 6: Analysis of Variance (ANOVA)

One-way ANOVA: One-way analysis of variance (ANOVA) tests the significance of group
differences between one or more means as it analyzes variation between and within each
group. ANOVA is appropriate when the independent variable (IV) is defined as having two or
more categories and the dependent variable (DV) is quantitative. Since ANOVA only
determines the significance of group differences and does not identify which groups are
significantly different, post hoc tests are usually conducted in conjunction with ANOVA.

The univariate case of ANOVA is a hypothesis testing procedure that simultaneously evaluates
the significance of mean differences on a dependent variable (DV) between two or more
treatment conditions or groups (Agretti & Finlay, 1997). The treatment conditions or groups are
defined by the various levels of the independent variable (IV).

For example, astrologers assert that our birth dates influence our success (or lack thereof) in
life. To test this assumption, data from the GSS 2000 data set by analyzing the relationship
between respondent’s astrological sign (ZODIAK) with the respondent’s socioeconomic index
(SEI). The SEI is an indicator of economic and socio-economic attainment: The higher the SEI
score, the more successful the respondent.

So, ANOVA allows comparison of group means. First, open the GSS 2000 data set (If you want
to follow along, the data is available from the GSS web page (search GSS and it should provide
the link to the GSS site where you can get the data. You would want the SPSS data from the
year 2000 (individual year data).

Open the data set and then you can perform the ANOVA procedure.

Go to (Graphic 6.1):

Analyze
Compare Means

One-way ANOVA

Graphic 6.1

The following dialog box as displayed below will appear.

Graphic 6.2

Enter the dependent variable, socioeconomic status (SEI10), into the dependent variable(s) box.
The independent variable (or in this case, factor) is the respondent’s zodiac sign (Zodiac). Just in
case you do not understand why, remember “astrologers assert that our birth dates influence our
success (or lack thereof) in life” so zodiac sign is the independent variable since it seen (at least
by astrologers) as influencing success, in this case as measured by socioeconomic status, which
is the dependent variable. Make sure the “descriptives” box is checked and then click “OK”.

Graphic 6.3

In the output there will be two tables (Graphic 6.4). Let’s see what information there is
contained in the two tables. In the first table, the “descriptives” are shown: the mean and
standard deviation, along with the upper and lower bound.

The mean SEI10 of Aquarius is 43.92 while the mean SEI10 of Aries is 45.10 but is there a
significant difference in the mean SEI by zodiac? The level of significance is displayed in the
second table under Sig. The ANOVA significance level is .188 and as such would not be
significant since it is not less than .05. On the basis of these data, astrologers do not appear to be
correct that there is power in the stars?

Graphic 6.5

Next, we are gong to test whether there is an association between a person’s race and the prestige
of their occupation. Again we will use ANOVA but also a post hoc test: the Tukey Test (watch
out as auto-correct wants to change it to the Turkey Test!). Using the GSS2000 data set we will
perform an ANOVA on RACE (respondent’s race) and PRESTG80 (respondent’s prestige of
their occupation). Currently the tukey test is only available using by entering the proper syntax.

To access the syntax dialog box
File
New
Syntax

Graphic 6.6

The syntax editor will open, then type [do not type anything in red, they are commands]
Oneway [enter]
/Variables = var1 BY var2 [enter—note var1 is the DV and var2 is the factor or IV]
/Statistics = descriptives [enter]
/Posthoc = tukey

Remember, do not actually type Var1 and Var2 but rather the names of the variables, in this
case Tvhours BY degree. The syntax should appear as shown in Graphic 6.7.

Graphic 6.7

Next, go to tool ribbon “File Edit Run Windows Help” and click on RUN and then ALL as
shown in Graphic 6.8.

Graphic 6.8

Check your output and you should have a nice table as shown in Graphic 6.9 and 6.10.

Graphic 6.9

Looking first at Graphic 6.9 we can list what the mean hours are and whether is a significant
difference between groups. For example:

Mean hours for people with less than a high school degree 4.11
Mean hours for people with a high school degree 3.04
Mean hours for people with a junior college degree 2.44
Mean hours for people with a bachelor’s degree 2.42
Mean hours for people with a graduate degree 1.64
ANOVA significance level .000

Graphic 6.10

Based on the ANOVA significance level, there is a significant difference between the mean
hours people watch TV and their level of education by degree. But which degrees are
significantly different from each. The post hoc Tukey test allows us to see if there is a
significant difference among the different groups. For example: According to the Tukey Test,
which categories have means that are significantly different from the means of those with a
junior college degree?

Check the row “junior college” in the Tukey test box (multiple comparisons), and we see that
none of all the categories except for less than high school are significant, which is significant at
.000. The level of significance for the other categories are high school at .071; bachelor’s degree
at 1.000; and graduate at .079. However, if you look at the row for less than high school, all the
other degrees and have significantly different mean TV viewing hours (sig. all at .000).

Looking at one other example, we will examine whether there is a significant difference in mean
occupational prestige score (PRESTG10)1 by race (Race). Just as we did with the previous
example, open up the syntax editor (Graphic 6.11).

Graphic 6.11

The syntax editor will open, then type the following[do not type anything in red, they are
commands]
Oneway [enter]
/Variables = var1 BY var2 [enter]
/Statistics = descriptives [enter]
/Posthoc = tukey

1 Some of the GSS datasets by year list occupational prestige as PRESTG80.

Remember as mentioned above, you do not type Var1 and Var2 but rather the names of the two
variables, in this case PRESTG80 BY RACE since we want to know whether there is a
difference in mean prestige scores by race. It should look like what is shown in Graphic 6.12.

Graphic 6.12

As in the previous example, go to tool ribbon “File Edit Run Windows Help” and click on RUN
and then ALL (see Graphic 6.8 if necessary). Check you output and you should have a set of
tables (Graphics 6.13 and 6.14).

Graphic 6.13

Looking first at Graphic 6.13, we can list the mean occupational prestige scores by race and the
ANOVA significance level.

Mean prestige score for Whites 43.72
Mean prestige score for African Americans 39.94
Mean prestige score for other races 42.93
ANOVA significance level .000

The ANOVA significance level indicates a significant difference in mean prestige scores but
among which groups. According to the Tukey Test, which categories have occupational prestige
score means that are significantly different from the means of Whites? Based on the Graphic
6.14, Black occupational prestige score means are significantly different (.000) but other (.738)
is not significantly different from white. Checking the row for Black, you will notice that other
is significantly different from Black (.038).

Graphic 6.14

What could explain this difference? That is good question for you!

PROGRAM EVALUATION

2

1)  Test whether there is an association between a person’s gender and the prestige of their occupation.  Use the GSS2018 data set to perform an independent samples t-test on SEX and PRESTIG10 depending on the dataset.  Report the following:

Mean prestige score for men                                                          =44.70

Mean prestige score for women                                                      = 44.67

t-test equality of means significance level                                     = 0.058 .954

Is the relationship statistically significant                                         No

Sig. = 0.298

An independent samples t-test on SEX and PRESTIG10 is being statistically tested from the GSS2018 data set. In this case, it is clear that gender is not associated with occupational prestige. The p-value from the data analysis gives a value of 0.298 which is greater than the significance limits of 0.05. We can therefore reject the null hypothesis that gender is related to occupational prestige. The difference between the two means is not statistically significant. The data used provides a strong evidence to make a conclusion that the two means are equal. This shows that occupational prestige does not change with gender.

 Group Statistics Respondents sex N Mean Std. Deviation Std. Error Mean R’s occupational prestige score (2010) MALE 1022 44.70 13.529 .423 FEMALE 1226 44.67 13.746 .393
 Independent Samples Test Levene’s Test for Equality of Variances t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper R’s occupational prestige score (2010) Equal variances assumed 1.082 .298 .058 2246 .954 .033 .578 -1.100 1.167 Equal variances not assumed .058 2185.428 .954 .033 .577 -1.099 1.165

2) You will report the information listed below for the GSS 2018 data set: Perform an independent samples t-test to compare the mean socioeconomic index (SEI10) of those who have had a born-again experience with those who have not (REBORN).

Mean SEI of those non-born again                                                 =48.766

Mean SEI of those born again                                                         =44.654

t-test equality of means significance level                                     =-4.196 .000

Is the relationship statistically significant                                      Yes

An independent samples t-test to compare the mean socioeconomic index (SEI10) of those who have had a born-again experience with those who have not (REBORN) is being statistically tested using the GSS 2018 data set. The data analysis gives a p-value of 0.000 which is less than the significance level of 0.05. We therefore accept the null hypothesis that religious experience of being born again is associated with socioeconomic status. This is so because the two means are statistically significant. The sample can be used to provide evidence that the two population means are not equal. I was expecting to find statistically equal means because I did not think religion can influence socioeconomic status. The results show that that the religious experience of being born again is associated with socioeconomic status. Step 1 is to determine if the assumption of equal variances is met. The results from the test will tell us which statistical test to use to examine the difference between the groups. The Levene test is significant at .001 so we assume the variable variances are not equal. Step 2: Based on the Levene test, use the statistical significance test for Equal variances not assumed which is .000.

 Group Statistics Has R ever had a ‘born again’ experience N Mean Std. Deviation Std. Error Mean R’s socioeconomic index (2010) YES 927 44.654 22.1767 .7284 NO 1282 48.766 23.4687 .6555
 Independent Samples Test Levene’s Test for Equality of Variances t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Difference Std. Error Difference 95% Confidence Interval of the Difference Lower Upper R’s socioeconomic index (2010) Equal variances assumed 11.679 .001 -4.158 2207 .000 -4.1119 .9888 -6.0510 -2.1728 Equal variances not assumed -4.196 2057.604 .000 -4.1119 .9799 -6.0336 -2.1903

3) Perform a paired t-test to compare the respondent’s mother’s occupational prestige score (MAPRES10) to the respondent’s father’s occupational prestige score (PAPRES10) using the GSS2018 data set.

Respondent’s mother’s occupational prestige score?             =42.57

Respondent’s father’s occupational prestige score?               =44.40

Significance for the Paired Samples Test?                                = –4.008 .000

Is the relationship statistically significant                                                      Yes

Is prestige related to generation?  What were you expecting to find, and did you find it?

A paired t-test to compare the respondent’s mother’s occupational prestige score (MAPRES10) to the respondent’s father’s occupational prestige score (PAPRES10) using the GSS2018 data set. A paired t-test is used to determine whether the mean of a dependent variable is the same in two related groups measured at two different times. A significance level of 0.05 works well and it indicates that a risk of 5% in concluding that a difference exists when there is no actual difference. The level of significance in this test is

Sig. 0.000065. The decision in this case is rejecting the null hypothesis. The means are statistically significant. A conclusion can be made that prestige is related to generation. I did not expect to find a difference between the sample variables means with respect to occupational prestige score.

 Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pair 1 Mothers occupational prestige score (2010) 42.57 1226 13.073 .373 Father’s occupational prestige score (2010) 44.40 1226 12.911 .369

 Paired Samples Correlations N Correlation Sig. Pair 1 Mothers occupational prestige score (2010) & Father’s occupational prestige score (2010) 1226 .236 .000

 Paired Samples Test Paired Differences t df Sig. (2-tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pair 1 Mothers occupational prestige score (2010) – Father’s occupational prestige score (2010) -1.838 16.063 .459 -2.739 -.938 -4.008 1225 .000

USE STATES10 DATA FOR THE NEXT SET OF QUESTIONS

4. Perform a paired t-test to compare the median earnings of male full-time workers (EMS168) to the median earnings of female full-time workers (EMS169) using the STATES10 data set.

Mean earnings of men?                                     =45124.824

Mean earnings of women?                                =34407.157

Significance for the Paired Samples Test?             =33.559 .000

Is the relationship statistically significant                                      Yes
A paired t-test to compare the median earnings of male full-time workers (EMS168) to the median earnings of female full-time workers (EMS169) using the STATES10 data set. The p-value in this case is 0.0000 which is lower that the significant p-value of 0.05. There is a significant difference between the two means. This shows that earnings are related to gender. The median earnings of male full-time workers are way higher than the median earnings of female full-time workers. I expected to find a difference between the means of male and female full-time workers because of the gender discrimination which has persisted from the history of the United States (Callaway & Sant’Anna, 2021).

 Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pair 1 Median Earnings of Male Full-Time Workers: 2008 45124.824 51 5352.2714 749.4679 Median Earnings of Female Full-Time Workers: 2008 34407.157 51 4941.3078 691.9215

 Paired Samples Correlations N Correlation Sig. Pair 1 Median Earnings of Male Full-Time Workers: 2008 & Median Earnings of Female Full-Time Workers: 2008 51 .905 .000

5. Using a variable called WAGEGAP that is the difference between median earnings of male full-time workers (EMS168) and the median earnings of female full-time workers (EMS169),

Create a histogram of WAGEGAP’s distribution

Describe the shape of the distribution.

In which state do women have earnings closest to men’s?     California Arizona

In which state do women’s women have earnings most disparate from men’s? Arizona WY

What might account for the variation in wage gaps observed across states?

A histogram of WAGEGAP’s distribution is used to show the difference between median earnings of male full-time workers (EMS168) and the median earnings of female full-time workers (EMS169), using the STATES10 data set. There are many factors which contribute to the witnessed wage gap in the United States. This include discriminatory practices, time away from employment, occupational clustering, and time demands of various jobs (Callaway, B., & Sant ’Anna, 2021). This issue makes it difficult for women to be flexible in employments making men have higher wages.

6.  From the States10 dataset preform a paired sample t-test and report the results using Overdose Deaths 1999 and 2005 (Pair 1) and Overdose Deaths 2005 and 2017 (Pair 2).

Pair 1

Mean = 5.6780

Mean = 10.3120

Significance for the Paired Samples Test?             =0.0000

Is the relationship statistically significant              Yes

Pair 2

Mean =10.3120

Mean =22.6440

Significance for the Paired Samples Test  =0.00

Is the relationship statistically significant              Yes

What do the t-test results suggest about deaths by overdose over the two-decade period?

A paired sample t-test for Overdose Deaths 1999 and 2005 (Pair 1) and Overdose Deaths 2005 and 2017 (Pair 2) is tested using the STATES10 data set. Considering (pair1) overdose deaths in 1999 and 2005 means are statistically significant. The number of deaths in 2005 is higher than the number of deaths in 1999. The second part significance level is 0.000 which is way less than p-value 0.05. This shows the data sets are statistically significant. The number of deaths is higher in 2017 than witnessed in 2005. Over the two-decade period the number of overdose deaths has been increasing.

Considering the reports given by CDC’s injury center concerning deaths and nonfatal overdoses is divided in four categories; Natural opioids, methadone, synthetic opioids, and heroin. The trends of deaths are higher for overdose done on opioids with the numbers in 2019 being 50,000 deaths which is six times higher than the number recorded in the year 1999 (Hedegaard & Spencer, 2019). The increase in the number of deaths is associated with increase is drug submissions.

 Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pair 1 Death Rate per 100,000 by Drug Overdose: 1999 5.6780 50 2.87049 .40595 Death Rate per 100,000 by Drug Overdose: 2005 10.3120 50 3.85855 .54568 Pair 2 Death Rate per 100,000 by Drug Overdose: 2005 10.3120 50 3.85855 .54568 Death Rate per 100,000 by Drug Overdose: 2017 22.6440 50 10.73155 1.51767
 Paired Samples Correlations N Correlation Sig. Pair 1 Death Rate per 100,000 by Drug Overdose: 1999 & Death Rate per 100,000 by Drug Overdose: 2005 50 .718 .000 Pair 2 Death Rate per 100,000 by Drug Overdose: 2005 & Death Rate per 100,000 by Drug Overdose: 2017 50 .390 .005
 Paired Samples Test Paired Differences t df Sig. (2-tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pair 1 Death Rate per 100,000 by Drug Overdose: 1999 – Death Rate per 100,000 by Drug Overdose: 2005 -4.63400 2.68842 .38020 -5.39804 -3.86996 -12.188 49 .000 Pair 2 Death Rate per 100,000 by Drug Overdose: 2005 – Death Rate per 100,000 by Drug Overdose: 2017 -12.33200 9.88579 1.39806 -15.14151 -9.52249 -8.821 49 .000

.