Chapter 2 Lab 2: Descriptive Statistics

Describing comic sensibility is near impossible. It’s sort of an abstract silliness, that sometimes the joke isn’t the star. —Dana Carvey

The purpose of this lab is to show you how to compute basic descriptive statistics, including measures of central tendency (mean, mode, median) and variation (range, variance, standard deviation).

2.1 General Goals

  1. Compute measures of central tendency using software
  2. Compute measures of variation using software
  3. Ask some questions of a data set using descriptive statistics

2.1.1 Important info

We will be using data from the gapminder project. You can download a small snippet of the data in .csv format from this link (note this dataset was copied from the gapminder library for R) gapminder.csv

2.2 Excel

In this lab you will explore a set of freely available data gathered by the GapMinder Project, that describes the wealth and life expectancies of several countries over the last ~60 years. Using descriptive statistics you will write a general account of this data, making evidence supported claims about life expectancy and wealth in several countries.

  • Calculating basic descriptive statistics.
  • Interpret these statistics and use them to make data based decisions.
  • Report and support your decisions in writing

In this lab we will attempt to answer the following questions:

In which North American country (Canada, Mexico or United states) does the typical citizen live the longest, over the years measured? In which North American country has life expectancy changed the most since 1952?

2.2.1 Statistics Covered

  • Measures of central tendency

    Name Formula
    Mean \(\bar{X}\) = \(\frac{\sum X}{N}\)
    Median The middle number
    Mode The most frequent number
  • Measures of variability

    Name Formula
    Maximum The largest number
    Minimum The smallest number
    Range Maximum - Minimum
    Sum of Squares \(SS_{X}\) = \(\sum X^2\) - \(\frac{(\sum X)^2}{N}\)
    Standard Deviation
    Population \(\sigma=\frac{SS_{X}}{N}\)
    Sample \(s=\frac{SS_{X}}{N-1}\)
    Variance
    Population \(\sigma^2\)
    Sample \(s^2\)

2.2.2 Load the data

  1. Download the data as a .csv file from above
  2. Open the downloaded file in excel
  3. Save the file as an Excel document

File → Save As…
filetype: .xlsx
name: personalize the file name so you will recognize this as your own copy of the data (maybe: Lab2_YourName)

2.2.3 Sort and Split

Because the questions ask about specific countries, we need to “split” the given data in order to easily calculate the required descriptive statistics.

You should now see the data organized into rows and columns in the open excel workbook.

  • Each column contains a single variable
  • The first row contains “headers” which label the variable in each column
  • Each row contains all the information gathered from a single measurement (a specific country in a specific year)

Looking at the bottom of the excel window, you should see that this workbook (another name for a excel data file) currently only contains one worksheet (a “page” within a workbook) named “GapMinder”

2.2.3.1 Splitting the Data by Country

  1. On the “sheet bar” at the bottom of the Excel window, click the “+” in order to add a new blank sheet to the workbook.
  2. Right click on the tab of the work sheet you just created, and name it after one of the countries you want to separate the data by (i.e. “Canada”).
  3. Repeat steps 1 and 2 until you have created a blank worksheet for each North American country.
  1. Click on the “GapMinder” worksheet.
  2. Press the control key and the letter “A” (Ctrl+A) twice, to select all data on the sheet.
  3. Click the “Data” button on the tools ribbon on the top of the screen, and then click the “sort” button.
  4. In the sort dialogue, make sure the “my data has headers” box is checked, and set “sort by” to “Country”. Then click “OK”
  1. Click on the “GapMinder” worksheet.
  2. click the row number next to the header row (1) to select that row
  3. Press ctrl+c to copy that row to your clip board
  4. go to each of the worksheets you created, and use ctrl+v to paste the header info into the first row of each sheet.
  1. Left click and hold on the row number of the row containing the first “Canada” measurement Continue to hold down the left mouse button, and drag downwards until you have selected all of the rows containing measurements from Canada.
  2. Press CTRL + C to copy these rows to your clipboard
  3. Go to the Canada worksheet you created by clicking its’ tab on the worksheet bar.
  4. Select cell A2
  5. Press CTRL + V to paste in that data you copied from the original sheet.
  6. Repeat steps 12-16 until you have copied each North American country’s data to the appropriately labeled worksheet.

Your data is now ready for analysis

2.2.4 Calculate Descriptive statistics

We will now calculate the mean, median, N, Max, Min, Range, and Standard Deviation of life expectancy and GDP for each country.

  1. Click the tab of the “Canada” worksheet.
  2. Scroll down to the final row of the data
  3. Skip one row, and in a blank cell in the column containing the “Life” variable, type:

=average(

  1. Now using your mouse, left click and hold on the bottom data point in the “LifeExp” column. Continue to hold down the left mouse button and drag upwards until you have selected all of the data on the Bronx work sheet in the fine amount column, and release the left mouse button
  2. In order to finish your function, type: > )

  3. Press enter and the cell you have been typing in should now display the life expectancy for Canada since 1952. (Hint: It should equal 74.9)

  1. Repeat steps 2-6, replacing “average” with: median, count (N), max, min, stdev.s (standard deviation of a sample).
  2. To calculate range, subtract the minimum value from the maximum value.
  1. Repeat steps 1-8 for the GDP variable, and each other country.

You have now calculated all of the descriptive statistics required to complete this assignment. For easier referencing of these values, I suggest you copy all the statistics you have calculated into a table, where each row is a borough, and each column is a different statistic. Remember to label all of your work and round appropriately!

Your result should look like this:

Statistic Canada Mexico United States

Mean | 74.9 | 65.4 | 73.5 Median | 75.0 | 66.2 | 74.0 Count | 12 | 12 | 12 Max | 80.7 | 76.2 | 78.2 Min | 68.8 | 50.8 | 68.4 Standard Deviation | 3.95 | 8.19 | 3.34 Range | 11.9 | 25.4 | 9.80

2.2.5 Interpert our results

In which North American country (Canada, Mexico or United states) does the typical citizen live the longest, over the years measured?

Since 1952, Canadian citizens have enjoyed a higher average life expectancy (74.9) than citizens of either Mexico (65.4), or the United States (73.5). This is pattern is also found in the median, minimum and maximum life expectancy in all three of these countries, as seen in the table above.

In which North American country has life expectancy changed the most since 1952?

Since 1952, Mexican citizens have experienced the most change in life expectancy (SD=8.19) than citizens of either Canada (3.95), or the United States (3.34). This is pattern is also found in the range of life expectancy in all three of these countries, as seen in the table above.

2.2.6 Conditional Function Method

Optional Bonus Learning!

There are many other ways to complete this assignment in excel, that are more complicated but much quicker. Advanced students may be interested in reading the Excel help file on the SUMIF and COUNTIF functions.

Additionally, the below function is a example of a truly general conditional function, where A was the column containing the “county”, or grouping variable, D1 contained the country of current interest, and B was the column containing the “LifeExp’, or the dependent variable.

=MODE(IF(\(A:\)A=D1,\(B:\)B))

2.2.7 On Your Own

Using the same strategy as above, answer these two questions and create a descriptive statistic table, using the GDP variable, which measures these countries health, instead of their life expectancy.

In which North American country (Canada, Mexico or United states) does the typical citizen have the most wealth, over the years measured? In which North American country has wealth changed the most since 1952?

Once you have answered these questions, write a short paragraph discussing how these numerical findings relate to your own hypothesizes, beliefs, and ideas about these 3 countries.