How to Predict the Election Result With a Coin?

Jalil Alizadeh 13 Oct, 2022

10 min read

This article was published as a part of the Data Science Blogathon.

_{Photo by Clay Banks on Unsplash}

Introduction

Predicting events and happenings has been an interesting skill for humans since the dawn of civilization and remains so today. This ability has been so attractive to many that they have selected the career of “prophecy” for themselves, attracting many followers and revenue in the process. Some of those people’s names, such as “Nostradamus,” are still heard, and some people are still looking for the realization or non-realization of his predictions after centuries.

Photo by Jen Theodore on Unsplash

Forecasting what will happen in the future is considered a critical business skill. In most cases, a company’s level of insight and understanding about the future is directly linked to its level of validity, success, and the rate at which it moves through the stages of growth. So, a company makes plans based on what will happen in the future; due to this, most businesses today pay more attention to data science and spend a lot of money to make their businesses more data-driven. Many companies also employ many data scientists and set up departments for data science.

It is evident that today’s forecasts, developed with data science, are significantly different from those made centuries ago. Forecasting and getting insights with the help of data science is a scientific process based on the analysis and a close look at a data set.

Let’s leave the business world and move to the more yellow world of elections. Guess who will win an election is one of the most interesting things people do. Every year, before an election, several statistical and research agencies start collecting and analyzing data to determine who will win before voting. As a result, Politicians usually pay statistics organizations large sums of cash to produce real-time reports on how the public perceives them. This way, the candidate can use intelligence to lead his campaign.

Main Problem with Collecting Data

Most enterprises and institutions that collect data have a big problem when they try to predict elections: they have a lot of wrong information. This problem will lead to wrong conclusions in the end. This problem was one of the most obvious during the 2016 presidential election in the US. In that year, Donald Trump and Hillary Clinton were in the race. Most polls and reputable news sites at the time said Hillary would win the election easily.

Source: NYTimes

Most people believed this prediction because, on the one hand, there was Trump, who had no political experience and had campaigned for election by making very harsh and aggressive statements. On the other hand, Clinton has been involved in politics for more than 30 years and has served as Secretary of State. With all of these explanations, it didn’t seem so unusual that Clinton beat Trump. Many reliable news sources, like the New York Times, said that Clinton’s chances of beating Trump were 85 to 15.

The Whole World is Impressed!

The whole world was amazed as the votes were being counted. Even the most optimistic statistical groups didn’t think Trump had a chance, but he won the electoral votes of key states one by one and quickly got the 270 electoral votes he needed to become president.

At the end of the vote, Trump won with 306 votes to Clinton’s 232. This was one of the most surprising results in US electoral history. This failure shocked a lot of data scientists. Many papers have been written up to this point, and many different parts of this situation have been examined.

Photo by Library of Congress on Unsplash

One of the main reasons why many pre-election polls give different results is that people don’t say who their favorite candidate is for various reasons. So, they either didn’t vote or said they would vote for someone else. However, on election day, when they went to the polls, they voted for their ideal candidate, who wouldn’t say his name. If most people do this, it’s clear that the poll results will change, and a statistical disaster like what happened in 2016 will happen again.

A Summary of What the Problem is!

In this article, we are not going to find out why the predictions for the 2016 elections were completely mistaken. In the future, we might publish a more in-depth article about the statistical reasons and factors that led to this. In this article, we’ll learn about a game based on math and statistics.

As was already said, one of the most important factors in predicting mistakes is the amount of wrong information that gets into the survey. Now, if we can change how we collect data to make it much more accurate, our prediction of how the election will turn out will probably be right. We’ll look at examples of this type of data tracking in the following sections.

Imaginary Forest

In a faraway forest where 200 animals live, there is an election to choose the forest leader. “Ms. Tree” and “Mr. Pig” are the two candidates in this election. “Ms. Tree” has been teaching forest residents for a long time. It thinks everyone in the forest and neighboring forests should live peacefully and work together.

On the other hand, “Mr. Pig” is a violent person who loves fighting. He thinks that a lot of the resources should be put into the military and, if possible, used to attack neighboring forests and steal their resources. A secret group that works in the forest has given us the job of finding out who will win the election before it happens. Our lives are at risk if we are wrong about what will happen!

Photo by Sergei A on Unsplash

As a result, we decided to survey every forest member to predict the election’s true winner. When we ask people who their favorite candidate is, Based on the atmosphere in the forest, the following things will happen:

If a person wants to vote for “Ms. Tree,” he selects her as his preferred candidate.
If someone wants to vote for “Mr. Pig”, they probably will give us the name of the “Ms. Tree” (We don’t know how often this happens, but we do know that it does happen sometimes.)

As a result, the most important thing for us is to get accurate data from forest people. So, we should use a strategy that lets us find out people’s real favorite candidate without making them say it out loud. In other words, we need to find a middle ground between our language and theirs so that they can answer our questions in that middle-ground language and we can understand them. While we get accurate information from a person, his or her personal opinion is kept private and does not become public. But what should be done to solve this problem?

A Coin and an Endless Amount of Possibilities

All you have to do to solve this problem is use a coin as a guide. Each coin sample space can be either “heads” or “tails.”

_{Photo by Steve Smith on Unsplash}

As a consequence, it’s enough to go to each of the forest’s residents and, by giving him a coin, ask him to play the following game in his home (without our presence) and tell us of the outcome; the rules of the game are as follows:

If the coin lands on “heads,” mention the name of the candidate you want to vote for.
If the coin lands on “tails,” the coin is thrown, call out the name of the “Mr. Pig.”

As was already said, the biggest problem with tracking data about the election was that many people who were willing to vote for the “Mr. Pig” didn’t come out and say so for various reasons.

This strategy solves the problem since it is unknown if a person voted for “Mr. Pig” because of desire or because the coin “tails” appeared. People may safely tell us the outcome of the game since we don’t know the outcome of the tap or the line. The following was the outcome of the data collection:

Creating a language between the person asking the question and the person answering it so that the person answering can confidently choose the right answer.
Minimizing the area for judgment and providing a secure atmosphere in which the responder may choose the preferred alternative.

By taking the above steps, the quality of data tracking will greatly improve, making it more likely that conclusions and assessments based on this data are proper.

Please feel free to share your thoughts or reading tips in the comments.

Follow me on LinkedIn, and Twitter.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.