Home » Basics of Probability for Data Science explained with examples in R

# Basics of Probability for Data Science explained with examples in R

This article is quite old and you might not get a prompt response from the author. We request you to post this comment on Analytics Vidhya's Discussion portal to get your queries resolved

### 54 Comments

• Arcady Novosyolov says:

Challenge 1: 7.96%, challenge 2: 9.1%

• Dishashree Gupta says:

Good Job !! Can you share your approach ?

• Rahul Aggarwal says:

CHALLENGE 1 :
= ncr * P(Success)^50 * P(fail)^50, n=100, r=50
=100c50 * (.5)^100
where 100c50=100891344545564193334812497256
=0.07958924

Challenge 2:
Z score= (120-100)/15 =1.33
Area under the curve for this value would be .9082 which is probability of value getting less than 120, so answer will be 1-.9082=0.091

• Ankur says:

should the answer of second challenge be 91% instead of 9.1%

• Ankur says:

Will the answer of first question be .07

• Ankur says:

Oh i got it yeah it will be 9.1%

• Dishashree Gupta says:

Can you explain your approach ?

• Shabna says:

Ans Challenge 1 : 0.5
Ans Challenge 2 : 9%

• Dishashree Gupta says:

You need to think again for Challenge 1. Challenge 2 answer is correct.

• Shahab Mahvi says:

Challenge #1 —–> My answer is 7.96%
Challenge#2 —–> My answer is 9.18%

• Dishashree Gupta says:

Perfecto ! Share your approach as well !

• shabna says:

ans challenge 1 : 8%

• sethu says:

Answer for challenge 2: is 9.18%

• shan says:

0.0918
0.079589237

• sethu says:

challenge 2 answer – 9.18%
Approach:
population mean=100, SD = 15. So z= (120-100)/15 = 1.33 . Use the z table and z=1.33 corresponds to 0.9082 which is the area under the curvefrom -infinity to 120. but we need people with iq greater than 120 , so 1-0.9082= 0.0918 ==>9.18%

Challenge 1 : However is little unclear for me.
we can have 100H,99H1T,98H2T,……..50H50T,49H,51T…………1H99T,100T. So n=101.
p(Head)= 0.5, P(Tail)=0.5
Event of 50H50T occurring is 1. so P(X=1)=101C1 x 0.5^1 x 0.5^100 = 3.98*e-29
Can you please clarify here what is incorrect in myunderstanding?

• shan says:

P(X)= 100C50 x 0.5^50 x 0.5^50 (from binomial dist)

• Dishashree Gupta says:

For Challenge 1, think of it as 100 bernoulli trials where probability of getting heads is 0.5 and getting tails is also 0.5. Now you need 50 heads and 50 tails out of 100 trials. So select 50 success out of 100, and apply the probabilities. so the answer would be 100C50 (0.5)^50 (0.5)^50.

So here p i.e. the probability of getting heads is 0.5 and q i.e. the probability of getting tails is 0.5. We need k success which is 50, out of n trials which is 100.

Try calculating now !!

• AMUTHARASI INDRA BUPATHY says:

challenge 1 :
P(X=k) = nCk pk qn-k
p=q=0.5
P(50)=100c50 * (0.5^50)* (0.5^50)
=1.00891E+29*8.881784e-16*8.881784e-16
=1.00891E+29*7.888609e-31
=0.07958897

challenge 2:
Observed value = µ+zσ
120=100+z*15
z=120-100/15
z=1.33 (value from z table=0.9082)
for >120,(total probability=1,upto 120=0.9082)
1-0.9082= 0.0918 ==>9.18%

• Dishashree Gupta says:

Correct !! 🙂

• laxminarayana says:

#1
7.96%
It’s a binomial distribution. n = 100, r = 50, p = q = 0.50.

#2
9.1%
z = -1.33.

• Dishashree Gupta says:

Correct !! 🙂
But z will be positive as the number is greater than the mean. 120 will fall to the right of 100.

• laxminarayana says:

Gotcha! That’s for the wonderful article! Keep posting.

• Aveesh Singroha says:

First Answer-

~7.96 %

Simply-100c50 * 0.5^50 * 0.5^50

Second asnwer – 9.133 %

120 = 100 + z15
z = 4/3 = 1.33
z value of 1.33 is 0.9066
Value above z = 1.33
100- 0,9066 = .09133 = 9.1 %

• Dishashree Gupta says:

Correct !!

• Sanjeev Kumar says:

Using excel

=BINOM.DIST(50,100,0.5,0)
=1-=NORM.DIST(120,100,15,1)

• Dishashree Gupta says:

correct !

• Poonam Lata says:

Very well written post Dishashree!

• Dishashree Gupta says:

Thank you ! Hope it was helpful !

• PAVAN says:

Here i computed the below solutions for the above 2 challenges using EXCEL.
=BINOM.DIST(50,100,0.5,FALSE)—— 0.079589237 or 79.5%
=1-NORM.DIST(120,100,15,TRUE)——0.09121122 or 91.21%

• Dishashree Gupta says:

You’re multiplying the numbers with 1000 to calculate percentages ! Why ??

• Acton says:

Good work
Mir Muhammad Alikhan

• Deepika says:

This blog has implemented the dental works in a best ideas so i like your presence of mind, so update latest kind of information.

• Dishashree Gupta says:

Thank you !! Hope it was helpful !

• Girish says:

This article is really very helpful in understanding the concepts clearly.
Thank you so much for making it simpler to understand Probability.

• padhma says:

It is really a great and useful piece of info. I’m glad that you shared this helpful info with us. Please keep us informed like this. Thank you for sharing.

• Mitra Guturu says:

1. 100C50 * (0.5)^50 * (0.5)^50

2. 9% will have IQ more than 120.

• Sarthak Girdhar says:

I believe under the Binomial Distribution, the first graph is “skewed left” (and not ‘skewed right’). Pardon me if I am wrong.

• Deepika says:

I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.

• sathya says:

this is really too useful and have more ideas from yours. keep sharing many techniques. eagerly waiting for your new blog and useful information. keep doing more.

• jeslin says:

wow really nice. It will be helpful for the people those who are ready to crack the interview and please also for remind what they have learned throughout concept.

• Siva says:

I appreciate the logical flow maintained in the article. Thank you.

My answer’s matched the comments.

• Sreedhar says:

Challenge 1: Contrary to the popular expectation, try calculating the probability of getting 50 heads and 50 tails on 100 flips of fair coins? This expectation is known as the gambler’s fallacy! An approximate answer would suffice!
Solution 1 :
binom.dist(number_success, trails, probability_success,cumulative)
Cumulative = false, since we are calculating point probability.
number_success = 50 , no.of trails = 100, probability of either head/tail on flipping a coin = 1/2 = 0.5,
binom.dist(50,100,0.5,false) – using excel formula, we will get 0.079 = 7.9%

Challenge 2: Try another one – In the United States, the average IQ is 100, with a standard deviation of 15. What percentage of the population would you expect to have an IQ more than 120?
Solution :
Normal distribution , since it is a measurement of IQ and has range
Excel formula : ‘norm.dist(expected prob,mean,stdev,cumulative) ‘
More than 120 means even to infinity. it wil be difficult to calculate till infinity. So, ‘>120’ is equal to ‘<= 120'

so , 1 – norm(120,100,15,true) = 1 – 0.908 = 0.0918 = 9.18%

• Reetu says:

Could someone hep me to solve following two questions
We repeatedly toss a biased coin with probability 0.6 of landing heads and probability 0.4 of landing tails, a group is maximal sequence of consecutive tosses that land on same side. For instance, the group of HTTTHTHTT are (H)(TTT)(H)(T)(H)(TT)

Then what is the expected number of group after 10 tosses?

What is the probability of (strictly) exceeding 6 groups after 10 tosses?

• Abhishek Chasta says:

Good read.

Ans 1 (using R):
> dbinom(50,100,0.5,FALSE)
 0.07958924

Ans 2(using R)
> pnorm(120,mean=100,sd=15,lower.tail =FALSE)
 0.09121122

• Ram Kasula says:

Using R
Answer – 1
choose(100,50) * 0.5 ^ 50 * 0.5 ^ 50 * 100 [ in percentage ]
 7.958924

Answer – 2
(1 – pnorm (120,100, 15)) * 100 [ in percentage ]
 9.121122

• Bickshal says:

Excellent and very helpful post. Thank you!

• hocine says:

thank you very much

• Piush Vaish says:

Interesting article. Thanks for a simple explanation of a complex concept.
Piush
http://adataanalyst.com/

• Hiranand Acchra says:

Well, I’ve to appear for an interview of a Market Research Organisation in a couple of days from now.
Just wanted to brush up quite a few concepts in limited time. Your post is indeed very logical and has helped me a lot.
Thanks!

• L. Ansari says:

This site is my absolute favorite for everything Data Science/Analytics. Very concise yet comprehensive articles, and I especially appreciate this article for refreshing probability/stats!!

Thank you so much!! 🙂

• Priyabrata Acharya says:

Challenge 1: 100C50*(0.5)^50*(0.5)^50 = 0.0795
Challenge 2: z=1.33 (120=100+z*15), from the ZTable P(x>120) = 1-P(x<120) = 1-0.9082 = 0.0918. Percentage: 9.18%

• Anirban Dutta says:

Another great article. It’s brilliant, the contributor to this website has created an archive of collectible pieces of data science articles. By far the best data science website and just not only in India.

• abhinav_jain says:

Another really informative article. Also, the language used is both easy to understand and comprises of appropriate technical jargons.

• Lucifer says:

Challenge1: Probability is 0.07958(approx) which is equivalent to 7.96%
nCk*P(Success)^k*F(Fail)*^k
100C50*(0.5)^50*(0.5)^50–0.07958–7.96%

Challenge2: 9.1%(approx)
Z–1.33
for upto 120 probabilty from Z table 0.9082
for more than 120(1-0.9082)–0.0918–9.1%