Continuous Bernoulli distribution

Continuous Bernoulli distribution
	Probability density function
Notation
Parameters
Support
PDF	; where
CDF
Mean
Variance

In probability theory, statistics, and machine learning, the continuous Bernoulli distribution[1][2][3] is a family of continuous probability distributions parameterized by a single shape parameter $\lambda \in (0,1)$ , defined on the unit interval $x\in [0,1]$ , by:

p(x|\lambda )\propto \lambda ^{x}(1-\lambda )^{1-x}.

The continuous Bernoulli distribution arises in deep learning and computer vision, specifically in the context of variational autoencoders,[4][5] for modeling the pixel intensities of natural images. As such, it defines a proper probabilistic counterpart for the commonly used binary cross entropy loss, which is often applied to continuous, $[0,1]$ -valued data.[6][7][8][9] This practice amounts to ignoring the normalizing constant of the continuous Bernoulli distribution, since the binary cross entropy loss only defines a true log-likelihood for discrete, $\{0,1\}$ -valued data.

The continuous Bernoulli also defines an exponential family of distributions. Writing $\eta =\log \left(\lambda /(1-\lambda )\right)$ for the natural parameter, the density can be rewritten in canonical form: $p(x|\eta )\propto \exp(\eta x)$ .

Related distributions

Bernoulli distribution

The continuous Bernoulli can be thought of as a continuous relaxation of the Bernoulli distribution, which is defined on the discrete set $\{0,1\}$ by the probability mass function:

p(x)=p^{x}(1-p)^{1-x},

where $p$ is a scalar parameter between 0 and 1. Applying this same functional form on the continuous interval $[0,1]$ results in the continuous Bernoulli probability density function, up to a normalizing constant.

Beta distribution

The Beta distribution has the density function:

p(x)\propto x^{\alpha -1}(1-x)^{\beta -1},

which can be re-written as:

p(x)\propto x_{1}^{\alpha _{1}-1}x_{2}^{\alpha _{2}-1},

where $\alpha _{1},\alpha _{2}$ are positive scalar parameters, and $(x_{1},x_{2})$ represents an arbitrary point inside the 1-simplex, $\Delta ^{1}=\{(x_{1},x_{2}):x_{1}>0,x_{2}>0,x_{1}+x_{2}=1\}$ . Switching the role of the parameter and the argument in this density function, we obtain:

p(x)\propto \alpha _{1}^{x_{1}}\alpha _{2}^{x_{2}}.

This family is only identifiable up to the linear constraint $\alpha _{1}+\alpha _{2}=1$ , whence we obtain:

p(x)\propto \lambda ^{x_{1}}(1-\lambda )^{x_{2}},

corresponding exactly to the continuous Bernoulli density.

Exponential distribution

An exponential distribution restricted to the unit interval is equivalent to a continuous Bernoulli distribution with appropriate parameter.

Continuous categorical distribution

The multivariate generalization of the continuous Bernoulli is called the continuous categorical.[10]

gollark: ++magic pyembed = discord.Embed(title="Title", description="Desc", color=0x00ff00)embed.set_image(url="https://i.osmarks.tk/bees.png%22)await ctx.send(embed=embed)

gollark: ++magic pyembed = discord.Embed(title="Title", description="Desc", color=0x00ff00)embed.set_image(url="https://i.osmarks.tk/bees.png")await ctx.send(embed=embed)

gollark: It's not persisted, mind you.

gollark: Wow. Amazing.

gollark: --achieve test

References

Loaiza-Ganem, G., & Cunningham, J. P. (2019). The continuous Bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems (pp. 13266-13276).
PyTorch Distributions. https://pytorch.org/docs/stable/distributions.html#continuousbernoulli
Tensorflow Probability. https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
Kingma, D. P., & Welling, M. (2014, April). Stochastic gradient VB and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (Vol. 19).
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016, June). Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning (pp. 1558-1566).
Jiang, Z., Zheng, Y., Tan, H., Tang, B., & Zhou, H. (2017, August). Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1965-1972).
PyTorch VAE tutorial: https://github.com/pytorch/examples/tree/master/vae.
Keras VAE tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.
Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Loaiza-Ganem, G., & Cunningham, J. P. (2019). The continuous Bernoulli: fixing a pervasive error in variational autoencoders. In Advances in Neural Information Processing Systems (pp. 13266-13276).

[2] PyTorch Distributions. https://pytorch.org/docs/stable/distributions.html#continuousbernoulli

[3] Tensorflow Probability. https://www.tensorflow.org/probability/api_docs/python/tfp/edward2/ContinuousBernoulli

[4] Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

[5] Kingma, D. P., & Welling, M. (2014, April). Stochastic gradient VB and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (Vol. 19).

[6] Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2016, June). Autoencoding beyond pixels using a learned similarity metric. In International conference on machine learning (pp. 1558-1566).

[7] Jiang, Z., Zheng, Y., Tan, H., Tang, B., & Zhou, H. (2017, August). Variational deep embedding: an unsupervised and generative approach to clustering. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (pp. 1965-1972).

[8] PyTorch VAE tutorial: https://github.com/pytorch/examples/tree/master/vae.

[9] Keras VAE tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.

[10] Gordon-Rodriguez, E., Loaiza-Ganem, G., & Cunningham, J. P. (2020). The continuous categorical: a novel simplex-valued exponential family. In 36th International Conference on Machine Learning, ICML 2020. International Machine Learning Society (IMLS).

Probability distributions (List)
Discrete univariate with finite support	Benford Bernoulli beta-binomial binomial categorical hypergeometric Poisson binomial Rademacher soliton discrete uniform Zipf Zipf–Mandelbrot
Discrete univariate with infinite support	beta negative binomial Borel Conway–Maxwell–Poisson discrete phase-type Delaporte extended negative binomial Flory–Schulz Gauss–Kuzmin geometric logarithmic negative binomial parabolic fractal Poisson Skellam Yule–Simon zeta
Continuous univariate supported on a bounded interval	arcsine ARGUS Balding–Nichols Bates beta beta rectangular continuous Bernoulli Irwin–Hall Kumaraswamy logit-normal noncentral beta raised cosine reciprocal triangular U-quadratic uniform Wigner semicircle
Continuous univariate supported on a semi-infinite interval	Benini Benktander 1st kind Benktander 2nd kind beta prime Burr chi-squared chi Dagum Davis exponential-logarithmic Erlang exponential F folded normal Fréchet gamma gamma/Gompertz generalized gamma generalized inverse Gaussian Gompertz half-logistic half-normal Hotelling's T-squared hyper-Erlang hyperexponential hypoexponential inverse chi-squared scaled inverse chi-squared inverse Gaussian inverse gamma Kolmogorov Lévy log-Cauchy log-Laplace log-logistic log-normal Lomax matrix-exponential Maxwell–Boltzmann Maxwell–Jüttner Mittag-Leffler Nakagami noncentral chi-squared noncentral F Pareto phase-type poly-Weibull Rayleigh relativistic Breit–Wigner Rice shifted Gompertz truncated normal type-2 Gumbel Weibull discrete Weibull Wilks's lambda
Continuous univariate supported on the whole real line	Cauchy exponential power Fisher's z Gaussian q generalized normal generalized hyperbolic geometric stable Gumbel Holtsmark hyperbolic secant Johnson's S_U Landau Laplace asymmetric Laplace logistic noncentral t normal (Gaussian) normal-inverse Gaussian skew normal slash stable Student's t type-1 Gumbel Tracy–Widom variance-gamma Voigt
Continuous univariate with support whose type varies	generalized chi-squared generalized extreme value generalized Pareto Marchenko–Pastur q-exponential q-Gaussian q-Weibull shifted log-logistic Tukey lambda
Mixed continuous-discrete univariate	rectified Gaussian
Multivariate (joint)	Discrete Ewens multinomial Dirichlet-multinomial negative multinomial Continuous Dirichlet generalized Dirichlet multivariate Laplace multivariate normal multivariate stable multivariate t normal-inverse-gamma normal-gamma Matrix-valued inverse matrix gamma inverse-Wishart matrix normal matrix t matrix gamma normal-inverse-Wishart normal-Wishart Wishart
Directional	Univariate (circular) directional Circular uniform univariate von Mises wrapped normal wrapped Cauchy wrapped exponential wrapped asymmetric Laplace wrapped Lévy Bivariate (spherical) Kent Bivariate (toroidal) bivariate von Mises Multivariate von Mises–Fisher Bingham
Degenerate and singular	Degenerate Dirac delta function Singular Cantor
Families	Circular compound Poisson elliptical exponential natural exponential location–scale maximum entropy mixture Pearson Tweedie wrapped

Continuous Bernoulli distribution
Probability density function
Notation	${\mathcal {CB}}(\lambda )$
Parameters	$\lambda \in (0,1)$
Support	$x\in [0,1]$
PDF	$C(\lambda )\lambda ^{x}(1-\lambda )^{1-x}\!$ where $C(\lambda )={\begin{cases}{\frac {2\tanh ^{-1}(1-2\lambda )}{1-2\lambda }}&{\text{ if }}\lambda \neq {\frac {1}{2}}\\2&{\text{ otherwise}}\end{cases}}$
CDF	${\begin{cases}{\frac {\lambda ^{x}(1-\lambda )^{1-x}+\lambda -1}{2\lambda -1}}&{\text{ if }}\lambda \neq {\frac {1}{2}}\\x&{\text{ otherwise}}\end{cases}}\!$
Mean	$\operatorname {E} [X]={\begin{cases}{\frac {\lambda }{2\lambda -1}}+{\frac {1}{2\tanh ^{-1}(1-2\lambda )}}&{\text{ if }}\lambda \neq {\frac {1}{2}}\\{\frac {1}{2}}&{\text{ otherwise}}\end{cases}}\!$
Variance	$\operatorname {var} [X]={\begin{cases}{\frac {(1-\lambda )\lambda }{(1-2\lambda )^{2}}}+{\frac {1}{(2\tanh ^{-1}(1-2\lambda ))^{2}}}&{\text{ if }}\lambda \neq {\frac {1}{2}}\\{\frac {1}{12}}&{\text{ otherwise}}\end{cases}}\!$