Make you have sent in assignments according to the instructions in "
 
  • Finish the ratemyprofessor assignment
  • Be sure you have answered all the questions that were asked of you along the way.  Ask me if you are confused about any of them.
  • Be sure to read the explanations so you understand the point of the assignment.
  • Submit the assignment to me (
 
The General Social Survey is a massive undertaking, requiring millions of dollars and well trained interviewers around the country.  In part because the survey is financed by your tax dollars, the data from the survey are available on the internet for anyone to examine.

I put together a subset of the GSS containing over 350 variables.  This dataset is available from this website (see the Data section of the website.)  I passed around a list of these variables.  (If you lost the list, I have posted it below.)

 


In class, I showed you how you can look up variables on the list on this website.

In class, we looked at the association between gender (sex) and marital status (marital).  Then, I asked you to come up with hypotheses (essentially, guesses) about 5 other variables that you think would be related to gender. 

Your homework is to create a crosstabulation (crosstab for short) like the one I showed you in class for each pair of variables you selected.  Underneath the table, write down the results.  That is, interpret the table.  Did things turn out as you expected or not? 

So what I am asking you to do has three steps:

1. Pick a variable from the list that you think is related to gender.  Be sure to check the variable on the website at the link above.  You want to make sure you know the question associated with the variable.  Give a hypothesis (guess) about why this variable is related to gender.

2. Create a crosstabulation (crosstab).  Put gender in the columns, and the other variable in the rows.  Get column percentages (% of column). 

3. Report the results.  Did your hypothesis turn out or not?


I would like you to do this with five different variables.  We did the first step in class.  Steps 2 and 3 are homework.

2008 General Social Survey - selected variables with brief descriptions
File Size: 33 kb
File Type: docx
Download File

 
In section 2, we started to talk about associations between two variables.  In particular, we were intersted in the association between gender and college at William Paterson.  What do I mean by association?  Consider two variables -- call them X and Y.  If X is associated with Y, then knowing X will allow you to predict Y with better than chance accuracy.  If X is NOT associated with Y, then knowing X won't enable you to predict Y any better than you could have without knowing Y. 
_
In the example in class, X was gender, and Y was academic college.  We had data on William Paterson students (including some of you).  Information that would allow ready identication has been stripped away.  We built a table called a "crosstab" and examined the gender distribution of each college at William Paterson.

We saw that gender was in fact related to academic college.  We talked about this in class.

Your homework for Thursday is to build a table and a chart, just as we did in class.  And do so using the dataset on WP students.  Your challenge, however, will be to pick another variables or variables on which to build a table.  For example, you could find another variable that you think might be related to gender, and then create a table (as we did) showing the relationship between gender and this other variable.  Good luck.  We'll be looking at more of associations like these on Thursday.
 
I have said that this class is about learning how to tell stories with data.  Many stories can be told with measures of central tendency like means, medians and modes.  But when we focus on central tendency, we summarize an entire variable with just a single number or a few numbers.  What about the rest of the numbers?  If we want to get a sense of the entire distribution of a variable, we can construct a frequency distribution.  (See pp. 83-84 for details.)  This is easy to do with a PivotTable.  Then we can create a bar chart based on the frequency distribution.  Bar charts that reflect frequency distributions are called histograms.

We have done this kind of thing before.  Now we're going to go into more detail.

For homework, I would like you to: 

1. Find some data online that meet the following specifications. 
a. Pick something on a topic that is interesting to you!
b. The data should contain at least one interval-level variable.  (In fact, you might  just find ONE variable, but that variable should be interval-level.)  
c. The dataset should have at least 30 cases. 

2. As usual, press CTRL-T to create a table with your data.

3. Summarize with PivotTable. 

4. Drag the variable into the Row Labels first.

5. Then drag the variable into the Values box.  Make sure you chosen "Count of" your variable. 

6.  Aim your mouse over the Row Labels in the table itself.  click on any one of the row labels in the table and click "Group..." to create Class Intervals. Group the categories into something that you think is reasonable.  We're aiming for something that looks rather similar to the table on page 84 of the book.  It could be categories of 5 like the table on p. 84 but you might find another grouping more sensible.  Its up to you -- as long as it makes sense.

7. Once you have created the table, add an additional column to the table containing column percentages.  To do this:

a Drag your variable into the values box a second time.  (You have already done this once to get counts.)
b. Then go to the drop-down menu and select Value Field Settings... You wanto "summarize by..." count..  as before.  Then click on the tab "Show Values as.." and choose % of column.

8. Give the table a descriptive title. 

9. Make sure you save the Excel file containing the data and the table on your USB drive to bring to class. 
 
Graphs are capable of conveying a good deal of information in an efficient, effective manner.  Graphs are powerful weapons in the arsenal of data storytellers. 

In class, we created line graphs to tell a story about climate change.  Specifically, we graphed average temperatures in New Jersey over the last century.  In the previous class, we created bar charts to tell a story about the vast differences in the resources of baseball teams.

Your homework is to find a dataset on the internet, and create a line chart or bar chart based on that data.

As you complete the assignment, keep in mind the following:

  1. First, find some data and paste or import it into Excel.  Remember that for our purposes, a dataset is a rectangular matrix with cases in the rows and variables in the columns. Your dataset should contain at least 25 cases.
  2. Once the data are in Excel, press CTRL-T to create a table.
  3. Click "summarize with PivotTable."
  4. Set up the PivotTable.
  5. Make the chart - either line or bar chart.
  6. Use the group/un-group feature if necessary to simplify the data (as you did with the temperature data).  
  7. Make sure the chart is properly labeled.  There should be a descriptive title.  Label the axes as well.

  • Having trouble coming up with data?  Here are some examples.  One student found data on average (inflation-adjusted) teacher salaries in the US over several decades.  She made a chart displaying change over time in average teacher salaries.  Another student made a line chart displaying population change over time.

 
Your homework for next time is to find another dataset and follow the same steps that we followed in class. Specifically, you should:

1. Find a dataset that meets the criteria I outlined in class.  The dataset should be a rectangular matrix with cases in the rows and variables in the columns.  One of the variables should be interval, and another variable should be nominal. 

2. Import the dataset into Excel using the method that we went over in class.  You can go to the DATA tab in Excel, and then click on the FROM THE WEB button.  Then browse the web for a dataset that you can import.

3. While highlighting one of the cells in the table, press CTRL-T to create a table. 

4. Give the table an appropriate, descriptive name.

5. Then click "Summarize with Pivot Table."  A new sheet will appear.  Now, create a Pivot Table in the same manner as we did in class.  Grab the nominal variable and put it in the Row Labels box.  Grab the interval variable and put it in the Values box.  Obtain four different statistics: average (mean), minimum (min), maximum (max) and count. 

6. Format the table so that it looks good.  Use the decimal adjustment buttons. 

7  Now, write a paragraph about the table you have created.  Summarize what you found.  Write this paragraph under the table that you created. 

8. Make sure that you save everything on your flash drive, and bring it to class.
 
Your homework is to record some data about yourself and then summarize that data using statistics.  I distributed an example in class.  Try to follow the example.

What data should you choose?  Your data should be interval-level.  But other than that, the choice is up to you.  Here are some possibilities.
  • A set of purchases you made. 
  • If you exercise regularly, you could record the amount of time that you exercise each day. 
  • If you have an exercise routine that includes weight-training, you could record information about the number of reps and amount of weights you lifted.
  • Your time spent commuting. 
  • Time spent doing some other task that might be useful to keep track of.
Use functions to calculate: total, average, count, minimum, maximum, mode, median and standard deviation.  I demonstrated these functions in a worksheet that I passed out in class.  You can find more information about functions on pp. 19-35 of the book. 

Calculate a "running total," similar to my column that contains "amount spent so far."  Enter the formula once.  Then click on the cell that contains the formula, and aim your cursor on the bottom right hand corner of the cell.  The cursor should become a plus sign.  When the cursor is a plus sign,  double-click the mouse.  The formula should be copied down the column.
Example - record of purchases
File Size:
File Type: xlsx
Download File

 
In class, I showed you how to examine the averages of one (interval) variable for different categories of another (nominal) variable.  I would like you to do the same thing using variables from a dataset that you found on the internet.  Save the Excel file on your flash drive and bring it to class with you. 

If you forgot the steps that we went through in class to generate tables with statistics by category, go to the instructions link on the left and download the instructions provided.

Email me any questions that you have.
 
If you haven't already gotten the book, do so.

Bring a flash drive or other file storage device with you to each class.  Attach the device to your key chain if you can so you won't lose it.

Read Chapter 1 of the book.  Don't worry about all of the terms on page 35.  We will be using some but not all of them. 

Find a dataset on the internet which has 1 variable of each: nominal, ordinal & interval.  The dataset should have at least 25 cases (rows).  Import the dataset into Excel, save it to your flash drive and bring to class with you.