Cara menggunakan plot multivariate gaussian python

In the last two chapters we used Gaussians for a scalar (one dimensional) variable, expressed as N(μ,σ2)\mathcal{N}(\mu, \sigma^2)N(μ,σ2). A more formal term for this is univariate normal, where univariate means 'one variable'. The probability distribution of the Gaussian is known as the univariate normal distribution

What might a multivariate normal distribution be? Multivariate means multiple variables. Our goal is to be able to represent a normal distribution across multiple dimensions. I don't necessarily mean spatial dimensions - it could be position, velocity, and acceleration. Consider a two dimensional case. Let's say we believe that x=2x = 2x=2 and y=17y = 17y=17. This might be the x and y coordinates for the position of our dog, it might be the position and velocity of our dog on the x-axis, or the temperature and wind speed at our weather station. It doesn't really matter. We can see that for NNN dimensions, we need NNN means, which we will arrange in a column matrix (vector) like so:

μ=[μ1μ2⋮μn]\mu = \begin{bmatrix}{\mu}_1\\{\mu}_2\\ \vdots \\{\mu}_n\end{bmatrix}μ=⎣⎡​μ1​μ2​⋮μn​​⎦⎤​

Therefore for this example we would have

μ=[217]\mu = \begin{bmatrix}2\\17\end{bmatrix}μ=[217​]

The next step is representing our variances. At first blush we might think we would also need N variances for N dimensions. We might want to say the variance for x is 10 and the variance for y is 4, like so.

σ2=[104]\sigma^2 = \begin{bmatrix}10\\4\end{bmatrix}σ2=[104​]

This is incorrect because it does not consider the more general case. For example, suppose we were tracking house prices vs total m2m^2m2 of the floor plan. These numbers are correlated. It is not an exact correlation, but in general houses in the same neighborhood are more expensive if they have a larger floor plan. We want a way to express not only what we think the variance is in the price and the m2m^2m2, but also the degree to which they are correlated. The covariance describes how two variables are correlated. Covariance is short for correlated variances

We use a covariance matrix to denote covariances with multivariate normal distributions, and it looks like this:

The Gaussian distribution(or normal distribution) is one of the most fundamental probability distributions in nature. From its occurrence in daily life to its applications in statistical learning techniques, it is one of the most profound mathematical discoveries ever made. This article will ahead towards the multi-dimensional distribution and get an intuitive understanding of the bivariate normal distribution.

The benefit of covering the bivariate distribution is that we can visually see and understand using appropriate geometric plots. Moreover, the same concepts learned through the bivariate distribution can be extended to any number of dimensions. We’ll first briefly cover the theoretical aspects of the distribution and do an exhaustive analysis of the various aspects of it, like the covariance matrix and the density function in Python!

Probability Density Function(or density function or PDF) of a Bivariate Gaussian distribution

The density function describes the relative likelihood of a random variable 

 at a given sample. If the value is high around a given sample, that means that the random variable will most probably take on that value when sampled at random. Responsible for its characteristic “bell shape”, the density function of a given bivariate Gaussian random variable 
 is mathematically defined as:

Where

is any input vector 
while the symbols 
 and 
 have their usual meaning.

The main function used in this article is the scipy.stats.multivariate_normal function from the Scipy utility for a multivariate normal random variable.

Syntax: scipy.stats.multivariate_normal(mean=None, cov=1)

Non-optional Parameters:

  • mean: A Numpy array specifying the mean of the distribution
  • cov: A Numpy array specifying a positive definite covariance matrix
  • seed: A random seed for generating reproducible results

Returns: A multivariate normal random variable object scipy.stats._multivariate.multivariate_normal_gen object. Some of the methods of the returned object which are useful for this article are as follows:

  • pdf(x): Returns the density function value at the value ‘x’
  • rvs(size): Draws ‘size’ number of samples from the generated multivariate Gaussian distribution

A “visual” view of the covariance matrix

The covariance matrix is perhaps one of the most resourceful components of a bivariate Gaussian distribution. Each element of the covariance matrix defines the covariance between each subsequent pair of random variables. The covariance between two random variables 

 and 
 is mathematically defined as  
 where 
 denotes the expected value of a given random variable 
. Intuitively speaking, by observing the diagonal elements of the covariance matrix we can easily imagine the contour drawn out by the two Gaussian random variables in 2D. Here’s how:

The values present in the right diagonal represent the joint covariance between two components of the corresponding random variables. If the value is +ve, that means there is positive covariance between the two random variables which means that if we go in a direction where 

 increases then 
 will increase in that direction also and vice versa. Similarly, if the value is negative that means 
 will decrease in the direction of an increase in 
.

Below is the implementation of the covariance matrix:

In the following code snippets we’ll be generating 3 different Gaussian bivariate distributions with same mean 

but different covariance matrices: 

  1. Covariance matrix with -ve covariance = 
  2. Covariance matrix with 0 covariance = 
  3. Covariance matrix with +ve covariance = 

Python




# Importing the necessary modules

import numpy as np

import matplotlib.pyplot as plt

from scipy.statsimport multivariate_normal

 

 

plt.style.use(import0import1

import2import3import4import5import6import7import8

 

import9

numpy as np0import5numpy as np2

 

numpy as np3

numpy as np4

numpy as np5import5 numpy as np7numpy as np8numpy as np9import7import1import7numpy as np9import4

 

import5

import6

import7import5 import9import1import7import1matplotlib.pyplot as plt3

 

matplotlib.pyplot as plt4

matplotlib.pyplot as plt5

matplotlib.pyplot as plt6 matplotlib.pyplot as plt7matplotlib.pyplot as plt8 matplotlib.pyplot as plt9from0

from1from2from3import7from5from6from7from3import1

from1 

from1scipy.stats2

from1scipy.stats4import5 scipy.stats6from3scipy.stats8from3import0

from1 

from1import3

from1import5

from1import7import5 import9import5 multivariate_normal1import5 multivariate_normal3

multivariate_normal4multivariate_normal5import5 multivariate_normal7

from1 

from1plt.style.use(0

from1plt.style.use(2

from1plt.style.use(4import5 plt.style.use(6import5 plt.style.use(8import1

from1 

from1import02

from1import04import1import06from3import08import09import10import5import12import7

import14import15import5 import17import7

import14import20import5 import22import1

from1import25import26import1

from1import29import30import1

from1import33import34import1

from1import37import38import1

from1 

import41

Output:

Samples generated for different covariance matrices

We can see that the code’s output has successfully met our theoretical proofs! Note that the value 0.8 was taken just for convenience purposes. The reader can play around with different magnitudes of covariance and expect consistent results.

3D view of the probability density function:

Now we can move over to one of the most interesting and characteristic aspects of the bivariate Gaussian distribution, the density function! The density function is responsible for the characteristic bell shape of the distribution.

Python




# Importing the necessary modules

import numpy as np

import matplotlib.pyplot as plt

from scipy.statsimport multivariate_normal

 

 

plt.style.use(import0import1

import2import3import4import5import6import7import8

import61import5 import63

 

import9

numpy as np0import5numpy as np2

 

numpy as np3

numpy as np4

numpy as np5import5 numpy as np7numpy as np8numpy as np9import7import1import7numpy as np9import4

 

import80

import81

import7import5 import9import1import7import1matplotlib.pyplot as plt3

 

import89

import90

import91import5 import93

 

import94

matplotlib.pyplot as plt6 matplotlib.pyplot as plt7matplotlib.pyplot as plt8 matplotlib.pyplot as plt9from0

from1 

from1scipy.stats2

from1scipy.stats4import5 scipy.stats6from3scipy.stats8from3import0

from1 

from1import3

from1import5

from1import7import5 import9import5 multivariate_normal1import5 multivariate_normal3

multivariate_normal4multivariate_normal5import5 multivariate_normal7

from1 

from1numpy as np30

from1numpy as np32

from1numpy as np34import5 numpy as np36import1numpy as np38from3import4

from1numpy as np42import5 numpy as np44import1import7import1numpy as np48from3import7from3import4

from1 

from1numpy as np55import5 numpy as np57numpy as np8from5numpy as np60numpy as np61from5numpy as np60numpy as np64import5numpy as np66import1

from1numpy as np69import5 numpy as np57numpy as np8from5numpy as np60numpy as np75from5numpy as np60numpy as np78import5numpy as np66import1

from1numpy as np83import5 numpy as np85

from1 

from1numpy as np88

from1numpy as np90

from1numpy as np92import5 numpy as np94

from1matplotlib.pyplot as plt6 numpy as np97matplotlib.pyplot as plt8 numpy as np99import00import1import02

import03matplotlib.pyplot as plt6 import05matplotlib.pyplot as plt8 numpy as np99import00from3import02

import11import12import5 import14

from1 

from1import17

from1import19import5 import21from7import23

from1import25import5 import27import5 import29import1

from1import32import5 import34import1

from1import29import38import1

from1import33import42import1

from1import25import26import1

from1import49

from1import51

 

import52

import41

 

import54

matplotlib.pyplot as plt6 matplotlib.pyplot as plt7matplotlib.pyplot as plt8 matplotlib.pyplot as plt9import59

from1from2from3import7from5from6from7from3import1

from1import70import5import34import1

from1import29import38import1

from1import33import42import1

from1import25import84import1

import52

import41

Output:

1) Plot of the density function

Density functions corresponding to different covariance matrices

2) Plot of contours

Contours of the density functions

As we can see, the density function’s contours exactly match the samples drawn by us in the previous section. Note that the 3 sigma boundary(concluded from the 68-95-99.7 rule) ensures maximum sample coverage for the defined distribution. As mentioned earlier, the reader can play around with different boundaries and expect consistent results.

Conclusion

We understood the various intricacies behind the Gaussian bivariate distribution through a series of plots and verified the theoretical results with the practical findings using Python. The reader is encouraged to play around with the code snippets for gaining a much more profound intuition about this magical distribution!

Postingan terbaru

LIHAT SEMUA