Home » An Introduction to the Powerful Bayes’ Theorem for Data Science Professionals

# An Introduction to the Powerful Bayes’ Theorem for Data Science Professionals

• Deepak Marwah says:

Great work done and notes are outstanding and very helpful..

• Kunal Mittal says:

This is an exhaustive article which includes all the elements of probability and related subjects required in data science. I like especially the way it covers the basic as well as the advanced topics in a comprehensive manner. Really helpful.

• Khyati Mahendru says:

Hi Kunal,

• Aaryan Mehta says:

Very useful notes, covers everything related to the topic with a great explanation along with pictures for a better understanding.

• Swapan says:

Hi.

We identified in Bayes Theorem that P(B) is evidence and is calculated as: –
P(B) = P(B|A)*P(A) + P(B|~A)*P(~A)

However later, we deduced evidence P(E) = P(E|A)*P(A) + P(E|B)*P(B) + P(E|C)*P(C).

Can you describe how we deduced P(E) here?

• Khyati Mahendru says:

Hi Swapan,

In the example, A, B, and C are mutually exclusive events with respect to E. At the same time, they are disjoint. What this means is that if A does not occur, one of B or C has to occur. Thus, event ~A = B U C. You can see ~A as having two events as its components.

Here is the mathematical formulation:
P(E|~A).P(~A)
= P(E|B U C).P(B U C)
= P(E and (B U C)), from conditional probability
= P((E and B) U (E and C))
= P(E and B) + P(E and C)
= P(E|B).P(B) + P(E|C).P(C)

Hope I have been able to clear your doubt.

• Ajit R. Jadhav says:

Dear Khyati,

Good effort, but some of the preliminary definitions perhaps can be improved. Let me give it a try. (I could go wrong, but guess it’s worth giving it a try…)

The term experiment here means a random experiment, i.e., one that encapsulates some random phenomenon. A trial (or run) of an experiment produces a result.

Any result of an experiment can be classified into various known classes called outcomes. Thus, outcomes may be viewed as attributes of the concrete happenstances that are results. The set of all possible outcomes (that any trial of an experiment can at all produce) is the outcomes set, aka outcome space. Thus, any result of an experiment always belongs to its outcome space.

The set of all possible subsets of the outcome-space that can occur, may be called, informally, as the sample space. Thus, a sample space consists of not just the outcomes set, but also, informally, all the different groupings of all the different elements that can be drawn from its corresponding outcomes set. (This is the reason why compound events can at all be defined.)

An event is a subset of the sample space. An event thus is a set. When a result that can be classified as belonging to an event-set A occurs, we informally say that event A has occurred.

If many trials are conducted, we may define a ratio of the number of times that a given event A occurred to the total number of trials that were conducted. This ratio is called the relative frequency of the event A. The relative frequency does change from one set of trials to another. However, If the random phenomena underlying the random experiment remain stable, then it is possible to take a limit of the successive relative frequencies of any arbitrary event A as the number of trials approaches infinity. This limit is called the probability of event A. By the nature of its definition, the probability remains a real number that is limited to the interval [0,1], both endpoints included.

A random variable is a function that maps events to probabilities. The sum of probabilities over the outcome space is 1. Bayes’ theorem becomes possible (and is interesting) mainly because not all events belong to the outcome-space—compound events also are possible.

Note that the formal theory of probability, for whatever reasons (best known to Kolmogorov, perhaps), skips over the definition of probability as the limit of relative frequencies, and instead chooses to take “the sum of probabilities over the outcomes space equals 1” as an axiom. This trick allows for a lot of philosophical debates to be introduced and conducted.

… Sorry, too long, but simply didn’t know where to stop writing. The phrase “philosophical debate,” however, did awaken me.

Best,

–Ajit

• Khyati Mahendru says:

Hi Ajit,
Your explanation is great and I am sure the readers will benefit from it. I welcome the feedback and agree that I might have used the terminology perhaps a bit loosely. There is definitely a scope for improvement. At the same time, I hope that I have been able to convey what I intended with this article, in spite of this.

• Ajit R. Jadhav says:

Dear Khyati,

Thanks, but I guess there still is some haziness in what I wrote. It’s not clear enough. For instance, I should have said that the outcomes must be mutually exclusive and collectively exhaustive, but didn’t note it as such. But it’s a requirement because only when it is fulfilled that can we say that P(\Omega) = 1 (where \Omega is the outcomes set), and also allow for any event to be one of the subsets of \Omega.

Actually, looseness or haziness creeps in our (engineers’) writings mainly because there are no books accessible to us and also comprehensive/rigorous enough. … Given the kind of books that are prescribed in the typical UG curricula (whether Indian or foreign authors’) your write up actually was good. It’s just that I find that these books themselves detailed or rigorous enough. … From what I read, I can say that Kreyszig is good for a quick and good overview, and Rohatagi and Saleh is great for a more rigourous treatment. But really speaking, it is statistics professors who should chime in and correct us all.

Anyway, keep up the good work, and bye for now.

Best,

–Ajit

• Paolino Madotto says:

Thank you for this clear and interesting article. Good Job!

• Khyati Mahendru says: