What is Normal Distribution In Statistics and What is the Empirical Formula?
In the Data Science field, statistics is a very important thing and you have to learn all the important topics from the statistics
And in the previous articles, I have already talked about some of the important topics which are included in the statistics
But there is one very important term which is a Normal Distribution which you must know
In this article, we are going to see about What is Normal Distribution and What is the Empirical Formula in the Normal Distribution
And there are a couple of things which you must know regarding this distribution which are very important to know
So, without wasting much more of your time let’s get started
What is Normal Distribution or Gaussian Distribution?
Now here, firstly I would tell you the technical definition of the Gaussian Distribution or the Normal Distribution
The Gaussian distribution which is also known as the Normal Distribution is in the form of the bell-shaped curve
At the time of measurement, the values will follow Normal distribution which will have an equal number of measurements and that would be above and below the mean value
So, this is the technical or we can say the common definition of this type of Distribution
And I know that here you may be confused or after reading this definition you did not get the complete idea behind this
So, don’t worry as further I would expand this and try to tell you simply so that you can understand it very well
Let’s say we have a random variable called ‘X’ and it can have any distribution
And suppose, it contains any continuous values or we can say the values which are in the range
So here we can say that ‘X’ belongs to the Gaussian Distribution with some value of mean and some value of standard deviation
And this we can represent as,
X ~ Gaussian Distribution (mean(Mu), Standard Deviation (Sigma))
After reading this you may be confused that, What it Mean and What is the Standard Deviation?
I just tell you simply and shortly that, the mean is nothing but the total sum of the data points divided by the total number of data points
So basically, we just take here the average of the total data points which we have which is nothing but the mean
Standard deviation is nothing but the square root of the variance
Now here you may say that what is the use of standard deviation and variance or what does it specify?
So basically it specifies that, from the mean how far all the data points are?
That is, whether the data points are 1standard deviation away from the mean to the right or 1 standard deviation to the left
And it goes on up to 3standard deviations that are if the data points are 2 standard deviations or 3 Standard deviations far to the right
As you have got the idea about these terms so let’s continue to our main topic
So, above as we have already discussed that the random variable X will follow the Gaussian distribution with some value of mean and some value of standard deviation
And if you have X which is a random variable and this kind of condition then this will follow the bell-shaped curve like in the above figure
So, basically, the random variable which follows a Gaussian distribution And which follows the bell-shaped curve which is called a bell curve
So this Bell curve shape will show you how far is the data point from the mean and it basically specifies the standard deviation
That is if the standard deviation which is to the right would be of the positive and to the left would be of negative
If it is 1 standard deviation away from the mean to the right then it would be denoted as “Mu+Sigma”
And if it is 2 standard deviations away from the mean then it would be denoted as “Mu + 2 Sigma” and for the 3 standard deviations it would be “Mu + 3 Sigma”
Now as we have seen in the theory behind the normal distribution or Gaussian Distribution and some important points
But there is one important term which is an Empirical Formula
So in the gaussian distribution, there is one important thing which is the empirical formula which we will see now
Pr(Mu-Sigma <= x <= Mu+Sigma) = 68% (Approximately Equal To)
Pr(Mu-2Sigma <= x <= Mu+2Sigma) = 95% (Approximately Equal To)
Pr(Mu-3Sigma <= x <= Mu+3Sigma) = 99.7% (Approximately Equal To)
Now after reading this Empirical formula you may have this doubt or you may ask that, what does this empirical formula mean?
So basically, if you talked about the first empirical formula then we can say that the x which is a part of our random variable X Which is a part of the number of values or the elements from the X which are present between the 1st standard deviation to the right and 1st standard deviation to the left that is between the 1st standard deviation is 68%
This means 68% of the random variable elements will be there in the first standard deviation
And if we talk about the 2nd empirical formula then 95% of the random variable elements will follow the second standard deviation
That is, 95% of the total elements will be present in the 2nd standard deviation
And if we talk about the last empirical formula then the 99.7% random variable elements will follow the third standard deviation
This means, 99.7% of elements would be present in the 3rd standard deviation
So whatever the information we have seen above is known as the empirical formula for the normal distribution
So guys always remember one thing that, when we talk about the gaussian distribution or normal distribution then it will always form a bell-shaped curve
And the center part of the bell-shaped curve would be the mean and then to the right it would be the first standard deviation and then second standard deviation and at the last third standard deviation
An empirical formula will specify that how much percentage of the data is distributed within the first standard deviation, second standard deviation, and third standard deviation
So in this article, we have seen the information about what is Gaussian Distribution or Normal Distribution
And also we have studied, what is an empirical formula in Normal distribution?
So if I tell you in short then you just have to remember that, this type of distribution which is normal distribution will always for a bell-shaped curve
And this will have the mean value at the center of the curve and it will have the three standard deviations in which you can get the distribution of the data points
I hope guys after reading this article you have got the complete idea about the normal distribution and the empirical formula
So thank you so much for giving your valuable time to read this article and have a great future ahead, bye