This year I finally got around to doing something I’ve wanted to do for a long time: a sampling demonstration with candy! It went really well, so I figured I’d write it up in case anyone else needs a bit of motivation.
First, I want to recognize the two “pushes” that finally encouraged me to try this. The first was a quick hallway conversation with my colleague, who does something similar in her Research Methods class. She mentioned this to me one day and I was reminded that I’ve always wanted to try it out, especially because I usually cover sampling right around Halloween-time so candy is definitely on lots of minds. After talking to my colleague, I went back to my office to do some Googling and see if I could find any advice or model activities, and I found this great blog post by Rick Wicklin about investigating M&M color distributions using SAS. Through this post, I learned that there are actually two “populations” of M&Ms produced in two different factories (Hackettstown [HKP] and Cleveland [CLV]), resulting in two separate color distributions. I decided I could use this information to make a quick in-class activity. My class periods are only 50 minutes — if yours are longer, you might need to add to this plan!
First, I went to my local grocery store to buy the M&Ms. This turned out to be a little harder than I thought because, as I mentioned, it’s Halloween time and there are lots of special-edition M&Ms available. The plain, milk chocolate M&Ms were buried on the bottom shelf of the candy aisle, but I eventually found them. I grabbed four bags, making sure that the codes on all four bags were the same. My bags were all marked “HKP,” indicating that they came from the Hackettstown, NJ, plant and would — presumably — have that plant’s color distribution of 25% each of blue and orange, and 12.5% each of the other four colors. I also bought a box of “snack”-sized ziploc bags.
At home, I used a food scale to equally divide my four large bags of M&Ms into 24 smaller baggies, being careful not to touch the candy — just for hygiene’s sake. This worked out to about 50gm/bag, or about 55 M&Ms for each student. I also set up an Excel spreadsheet for students to use to track the color proportions of their samples. I inserted a graph for them and linked it to the cells so that as they entered their data, the graph would automatically update. If you have a longer class period than I do, you could have students create this spreadsheet for themselves. You can download my spreadsheet here.
In class, I gave each student one bag of M&Ms and had them download the spreadsheet from Blackboard. I started off the conversation by asking them if they had ever considered the color distribution of M&Ms — did they think there were equal proportions of colors? Why or why not? Could we consider the bags of M&Ms that I bought at the store to be truly “random” samples? Why or why not? What about the little bags they had — were these random samples? The students were a bit reluctant and shy at this point, but they were happy to have candy in front of them.
I then asked the students to count the number of M&Ms of each color in their sample, calculate the proportion, and enter the data into their spreadsheets. I then had them compare their bar graphs with their neighbors and note differences. We then compared bar graphs as a class. We talked about how each bar graph was different, but we could already see some similarities emerging — for example, “blue” was the most common color in all of the samples except one (which had more orange), and it seemed like green and brown were less common. I asked them how confident they would feel about predicting the precise color distribution for the population of M&Ms based on their small samples. I asked them how confident they would feel making a less precise statement, like “M&M colors are not evenly distributed” or “Blue and orange are the most common colors” and they felt slightly more confident. I asked them what they could do to improve their predictions, and they said “Get more M&Ms.” We then talked about the added cost (in money and time) of increasing sample size — getting more M&Ms would cost more money and take more time to count, but it would give us more information.
Students then combined their samples with two other classmates (our classroom is set up in three-seat sections) and recalculated their proportions. We talked about how their larger sample compared to the individual small samples, and one student noted that her small sample had had “extreme” values of red M&Ms, once she combined her sample with two others, the extreme values were balanced out.
At this point, I switched to Sheet 2 of the spreadsheet and introduced new information: There are really two populations of M&Ms and they have different color distributions. On Sheet 2 of the spreadsheet, students could see the proportions and bar charts for each distribution. I made the students an offer: If they, as a class, could correctly guess which plant our M&Ms came from, I would give them all a small amount of extra credit. The limitation was that they had to make the decision together, they could only give me a single answer, and it had to be right.
At first, the students tried to “reason it out.” They started debating with each other, pointing to the distributions of their smaller samples. I gently reminded them that they didn’t have to debate it — they could actually analyze some data here, hint hint.
They then jumped into action. One student ran up to the board and started writing out the M&M colors. Another student joined her and started going group by group, asking for the number of M&Ms of each color in their combined samples. The student at the board wrote out the totals and another student started entering the data into the spreadsheet. When they finally had the entire class sample (the total of the 24 small bags) tallied up, they looked at the bar graph and concluded unanimously that the bags were from the Hackettstown plant — and they were right!
At the end of class, we debriefed. I pointed out that they had just used the scientific method to help them make a more accurate guess — they had a question, they made observations, they analyzed data, they interpreted and discussed their results, and they made a decision based on that information. We talked about their original approach of debating it out and how actually making observations allowed them to be more confident in their decision. We also talked about how the entire-class sample still didn’t have identical proportions to the HKP proportions but that it was “close enough,” which brought us back to the question of sample size, precision, and in which scenarios it might be worth it to invest more resources in increasing sample size versus scenarios where “close enough” is “good enough.”
Overall, I think that the activity worked really well and was a very welcome break in the semester. It reminded me that I want every class to be like this — interesting, exciting, fun, loosely structured, engaging. I’ll definitely do this activity again next year.