**PROBABILITY AND DATA**

Probability and statistics are major mathematical procedures used in data analysis to help solve business problems. Understanding probability will help us in drawing insights from a given data and making informed decisions. This article introduces us to probability and its relationship with data.

First, let us start by defining probability and data. Then we show the interrelation between probability and data with a simple analogy.

**What is Probability?**

Probability is simply defined as the likelihood that an event will occur. In other words, it is the chance that there will be a certain outcome from a given experiment. For example, when we flip a coin, there is the likelihood of having either a head or a tail.

**What is Data?**

Data can be defined as facts and statistics gotten for analysis.

From the example given above, our data will be generated by flipping a coin ’n’ times and counting how many times we get heads or tails.

Let us say we flip the coin thrice at eight separate times and got the following result;

X= {HTH, TTT, TTH, HTT, HHH, THT, THH, HHT}, where H=heads and T=tails.

Then, our data will be in the form;

X(w_1) =2,

X(w_2) =0,

X(w_3) =1,

X(w_4) =1,

X(w_5) =3,

X(w_6) =1,

X(w_7) =2,

X(w_8) =2,

Where X (w_ j, j =1,2,3,4,5,6,7,8) is a random variable that shows the number of times we got heads from the coin flip in each experiment.

From the above data, we can find the probability of each outcome i.e., the probability of getting certain number of heads.

P{x=0} = 1/8 the probability of getting no heads from the coin flip

P{x=1} = 3/8 the probability of getting 1 head from the coin flip

P{x=2} = 3/8 the probability of getting 2 heads from the coin flip

P{x=3} = 1/8 the probability of getting 3 heads from the coin flip

Let us try simulating this with python

#importing librariesimport numpy as np

import matplotlib.pyplot as plt

%matplotlib inline#specifying parameters

p= 0.5 #probability of getting a head in a fair coin flip

n= 3 #number of coin flip in each experiment

N = 8 #number of times experiment was carried outdef coin_trial(n,p,N):

outcome = np.random.binomial(n,p,N) #draws sample from a

#binomial distribution

return outcome

outcome = coin_trial(n,p,N)

print("Number of Heads in each Experiment:\n",outcome)

The above line of code will return the number of heads in each experiment with our outputs varying anytime we run the code. The output below was returned by the code.

Number of Heads in each Experiment:

[1 3 2 1 2 0 3 2]from the output we have the probability of heads in each experiment as:

P{x=0}=1/8 = 0.125

P{x=1}=2/8 = 0.25

P{x=2}=3/8 = 0.375

P{x=3}=2/8 = 0.25

**Probability Distribution Plots**

Probability distribution plots are used to show how a data is distributed. when we plot a data, it might take the shape of one of the following probability distributions.

. This is a distribution that is characterized by having only two outcomes e.g., head or tail, true or false, pass or fail, to mention a few. From the example of the coin flip, our data follows a Binomial distribution.*Binomial distribution*

** 2. Uniform Distribution**: For a uniform distribution, all outcomes of an experiment have equal chances of occurring. The example of the coin flip can be cited in this case as there are equal chances of getting either a head or a tail in fair coin flip. A uniform distribution assumes the shape shown in the image below.

** 3. Exponential distribution**: An exponential distribution is used to model the time elapsed between two events. For example, let us say we have 3 coins or more and we flip them severally until we observe the first heads, if we end up not observing any heads over a period, we start all over again. In other words, the amount of time until a head is flipped is exponentially distributed.

** 4. Bernoulli Distribution**: Just like Binomial distribution, a Bernoulli distribution has just two outcomes but only one trial i.e., we can only carry out the experiment once e.g., a single coin flip.

** 5. Poisson Distribution:** A Poisson distribution is used in instances where we are counting the occurrences of certain events in a time interval. For example, counting the number of patients who visit a dental clinic from 2pm to 3pm.

Other probability distributions a data can assume include; Geometric distribution, Gamma distribution, Normal distribution, and Beta distribution.

Thank you for reading.