Your challenge is to take what we have done in class and apply it to datasets of your choosing. 

Your task is to find data to produce the following three sets of analyses.  You have conducted  each of these analyses in class before according to detailed instructions. 

Now, the challenge is to execute these analyses in a creative way.  You can use data posted on this website or data online.  But your analyses must be original.  (In other words, don't do something that we have already done in class.) 

1. Cross-tabulations with chi-square test

For this analysis, you should find a dataset that contains two nominal-level variables that might possibly relate to each other.  In other words, you should find a dataset which contains a nominal-level variable that just might possibly cause another nominal-level variable.  Remember that the cause is the independent variable and the effect is the dependent variable.  (Example: gender causes someone's attitudes about marijuana.)

For hints on how to create cross-tabulations, look back at what we did with the data.  We produced three sets of identical tables, examining ratings of professors on helpfulness by clarity, and helpfulness by easiness.   I would like you to create tables similar to these, except this time, you will choose your own variables.

In a crosstabulation, your independent variable should go in the columns, and your dependent variable should go in the rows.  Remember this when creating the Pivot Table. 

Your output here should include three tables: (1) counts of dependent variable by the independent variable, (2) (column) percentages of dependent variable by independent variable, and (3) expected counts.  Finally, (4) use the =CHITEST function to obtain a chi-square test statistic. 

The chi-square test statistic compares the ACTUAL counts in the table with the counts we would expect if there was no relationship between the variables. 

=CHITEST(actual, expected)

Finally, interpret your findings.  Is there a relationship between the variables or not?  Why or why not? (Speculate.)

2. Regression & correlation

For this analysis, find a dataset that contains two ordered (ordinal or interval-level) variables that might relate to each other.  You should have one variable that could be a cause or independent variable, and another variable that could be an effect or dependent variable. 

Example from class: students' ratings of professor's easiness (independent) and students' overall ratings of professor

Your output for this analysis should include: 

(1) a scatterplot with the dependent variable in the Y-axis and the independent variable in the X-axis.  Insert a trend (regression) line into the scatterplot.  Then include an overall title above.  Label the X-axis and the Y-axis.  Delete the legend on the side, as it is not informative in this case.

(2) a slope of the regression line (the slope is m in the equation
y = mx+b.)  Interpret the slope.  What does it mean in your data?

(3) the Y-intercept of the regression line.  The y-intercept is b in the equation y=mx+b.  Interpret the y-intercept.  What does mean in this case?

(4) predicted values for each case in the dataset.  Predicted values are obtained by multiplying the value of the independent variable by the slope and then adding the y-intercept.  (predicted y = mx+b.)

(5) residuals for each case in the dataset.  Calculate the residual, or the difference between the actual value of the dependent variable and the predicted value.  The residual = y - predicted y.  Then sort the data by the residuals.  Are there cases with very high or very low residuals?  What is going on with those cases with very high residuals and those cases with very low residuals?

(6) a correlation between Y and X.  Use the =CORREL function.  Interpret the correlation.  There are two aspects of a correlation that require interpretation: (a) direction and (b) magnitude.  Direction: is the correlation positive or negative, and what does this say about the relationship between the independent and dependent variables?  Magnitude: Correlations farther from zero are higher in magnitude.  High correlations indicate that variables are strongly related to each other.  Low correlations indicate that variables are weakly related to each other.   Correlations of about 0.1 or -0.1 are weak.  Correlations of 0.7 and -0.7 or above are strong.  Correlations of about 0.3 or -0.3 are moderate in size. 

3. Comparison of means between two groups, accompanied by a t-test

For this analysis, you need to find a dataset that contains (1) an independent variable that is nominal-level, and (2) a dependent variable that is interval-level. 
Example: gender (cause) and income (effect).  We would look at average incomes for men and women.

You should create a PivotTable to calculate averages, standard deviations, and counts of an interval-level by the categories of a nominal-level variable. 

Compare the averages and stanard deviations across the groups.  Use the =TTEST function to test whether the difference in averages between two groups is statistically significant.  (By statistically significant, I mean that the difference in means between the groups is unlikely to have occurred through chance alone.)

Leave a Reply.

    Jacob Felson received his doctorate in sociology from Penn State University in December, 2009. He received a master's in sociology from Penn State in 2004, and a bachelor's degree, also in sociology, from the University of Chicago in 2002


    June 2010
    May 2010



    RSS Feed