Sitemap

PROBABILITY AND DATA

4 min readMar 12, 2021

Probability and statistics are major mathematical procedures used in data analysis to help solve business problems. Understanding probability will help us in drawing insights from a given data and making informed decisions. This article introduces us to probability and its relationship with data.

First, let us start by defining probability and data. Then we show the interrelation between probability and data with a simple analogy.

What is Probability?

Probability is simply defined as the likelihood that an event will occur. In other words, it is the chance that there will be a certain outcome from a given experiment. For example, when we flip a coin, there is the likelihood of having either a head or a tail.

What is Data?

Data can be defined as facts and statistics gotten for analysis.

From the example given above, our data will be generated by flipping a coin ’n’ times and counting how many times we get heads or tails.

Let us say we flip the coin thrice at eight separate times and got the following result;

X= {HTH, TTT, TTH, HTT, HHH, THT, THH, HHT}, where H=heads and T=tails.

Then, our data will be in the form;

X(w_1) =2,

X(w_2) =0,

X(w_3) =1,

X(w_4) =1,

X(w_5) =3,

X(w_6) =1,

X(w_7) =2,

X(w_8) =2,

Where X (w_ j, j =1,2,3,4,5,6,7,8) is a random variable that shows the number of times we got heads from the coin flip in each experiment.

From the above data, we can find the probability of each outcome i.e., the probability of getting certain number of heads.

P{x=0} = 1/8 the probability of getting no heads from the coin flip

P{x=1} = 3/8 the probability of getting 1 head from the coin flip

P{x=2} = 3/8 the probability of getting 2 heads from the coin flip

P{x=3} = 1/8 the probability of getting 3 heads from the coin flip

Let us try simulating this with python

#importing librariesimport numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
#specifying parameters
p= 0.5 #probability of getting a head in a fair coin flip
n= 3 #number of coin flip in each experiment
N = 8 #number of times experiment was carried out
def coin_trial(n,p,N):
outcome = np.random.binomial(n,p,N) #draws sample from a
#binomial distribution
return outcome
outcome = coin_trial(n,p,N)
print("Number of Heads in each Experiment:\n",outcome)

The above line of code will return the number of heads in each experiment with our outputs varying anytime we run the code. The output below was returned by the code.

Number of Heads in each Experiment:
[1 3 2 1 2 0 3 2]
from the output we have the probability of heads in each experiment as:
P{x=0}=1/8 = 0.125
P{x=1}=2/8 = 0.25
P{x=2}=3/8 = 0.375
P{x=3}=2/8 = 0.25
shape of the distribution of the coin flip data
Shape of Sample Data.

Probability Distribution Plots

Probability distribution plots are used to show how a data is distributed. when we plot a data, it might take the shape of one of the following probability distributions.

  1. Binomial distribution. This is a distribution that is characterized by having only two outcomes e.g., head or tail, true or false, pass or fail, to mention a few. From the example of the coin flip, our data follows a Binomial distribution.
A Binomial Distribution Curve

2. Uniform Distribution: For a uniform distribution, all outcomes of an experiment have equal chances of occurring. The example of the coin flip can be cited in this case as there are equal chances of getting either a head or a tail in fair coin flip. A uniform distribution assumes the shape shown in the image below.

A Uniform Distribution Curve.

3. Exponential distribution: An exponential distribution is used to model the time elapsed between two events. For example, let us say we have 3 coins or more and we flip them severally until we observe the first heads, if we end up not observing any heads over a period, we start all over again. In other words, the amount of time until a head is flipped is exponentially distributed.

An Exponential Distribution Curve

4. Bernoulli Distribution: Just like Binomial distribution, a Bernoulli distribution has just two outcomes but only one trial i.e., we can only carry out the experiment once e.g., a single coin flip.

A Bernoulli Distribution Curve

5. Poisson Distribution: A Poisson distribution is used in instances where we are counting the occurrences of certain events in a time interval. For example, counting the number of patients who visit a dental clinic from 2pm to 3pm.

A Poisson Distribution Curve

Other probability distributions a data can assume include; Geometric distribution, Gamma distribution, Normal distribution, and Beta distribution.

Thank you for reading.

--

--

Ronke Akinmosin
Ronke Akinmosin

Written by Ronke Akinmosin

Data Scientist || Data Storyteller || Data Analyst

Responses (1)