| Title: | Generation of Artificial Binary Data |
|---|---|
| Description: | Generation of correlated artificial binary data. |
| Authors: | Friedrich Leisch [aut] (ORCID: <https://orcid.org/0000-0001-7278-1983>, maintainer up to 2024), Andreas Weingessel [aut], Kurt Hornik [aut, cre] (ORCID: <https://orcid.org/0000-0003-4198-9911>) |
| Maintainer: | Kurt Hornik <[email protected]> |
| License: | GPL-2 |
| Version: | 0.9-24 |
| Built: | 2026-05-25 06:16:08 UTC |
| Source: | https://github.com/cran/bindata |
Compute a matrix of common probabilities for a binary random vector from given marginal probabilities and correlations.
bincorr2commonprob(margprob, bincorr)bincorr2commonprob(margprob, bincorr)
margprob |
vector of marginal probabilities. |
bincorr |
matrix of binary correlations. |
The matrix of common probabilities. This has the probabilities that
variable equals 1 in element , and the joint
probability that variables and both equal 1 in element
(if ).
Friedrich Leisch
Leisch F, Weingessel A, Hornik K (1998). “On the Generation of Correlated Artificial Binary Data.” Working Paper 13, SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business. doi:10.57938/6884f809-93bc-4497-ab2b-fc1611198f5b.
commonprob2sigma,
simul.commonprob.
The main diagonal elements commonprob[i,i] are interpreted as
probabilities that a binary variable
equals 1. The
off-diagonal elements commonprob[i,j] are the probabilities
that both and are 1.
This programs checks some necessary conditions on these probabilities
which must be fulfilled in order that a joint distribution of the
with the given probabilities can exist.
The conditions checked are
check.commonprob(commonprob)check.commonprob(commonprob)
commonprob |
Matrix of pairwise probabilities. |
check.commonprob returns TRUE, if all conditions are
fulfilled. The attribute "message" of the return value contains
some information on the errors that were found.
Andreas Weingessel
Leisch F, Weingessel A, Hornik K (1998). “On the Generation of Correlated Artificial Binary Data.” Working Paper 13, SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business. doi:10.57938/6884f809-93bc-4497-ab2b-fc1611198f5b.
simul.commonprob,
commonprob2sigma
check.commonprob(cbind(c(0.5, 0.4), c(0.4, 0.8))) check.commonprob(cbind(c(0.5, 0.25), c(0.25, 0.8))) check.commonprob(cbind(c(0.5, 0, 0), c(0, 0.5, 0), c(0, 0, 0.5)))check.commonprob(cbind(c(0.5, 0.4), c(0.4, 0.8))) check.commonprob(cbind(c(0.5, 0.25), c(0.25, 0.8))) check.commonprob(cbind(c(0.5, 0, 0), c(0, 0.5, 0), c(0, 0, 0.5)))
Computes a covariance matrix for a normal distribution which
corresponds to a binary distribution with marginal probabilities given
by diag(commonprob) and pairwise probabilities given by
commonprob.
For the simulations the values of simulvals are used.
If a non-valid covariance matrix is the result, the program stops with an error in the case of NA arguments and yields are warning message if the matrix is not positive definite.
commonprob2sigma(commonprob, simulvals)commonprob2sigma(commonprob, simulvals)
commonprob |
matrix of pairwise probabilities. |
simulvals |
array received by |
A covariance matrix is returned with the same dimensions as
commonprob.
Friedrich Leisch
Leisch F, Weingessel A, Hornik K (1998). “On the Generation of Correlated Artificial Binary Data.” Working Paper 13, SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business. doi:10.57938/6884f809-93bc-4497-ab2b-fc1611198f5b.
m <- cbind(c(1/2,1/5,1/6),c(1/5,1/2,1/6),c(1/6,1/6,1/2)) sigma <- commonprob2sigma(m)m <- cbind(c(1/2,1/5,1/6),c(1/5,1/2,1/6),c(1/6,1/6,1/2)) sigma <- commonprob2sigma(m)
Returns a matrix containing the conditional probabilities
where corresponds to the i-th
column of x.
condprob(x)condprob(x)
x |
matrix of binary data with rows corresponding to cases and columns corresponding to variables. |
Friedrich Leisch
Converts all values of the real valued array x to binary values
by thresholding at 0.
ra2ba(x)ra2ba(x)
x |
array of arbitrary dimension |
Friedrich Leisch
x <- array(rnorm(10), dim=c(2,5)) ra2ba(x)x <- array(rnorm(10), dim=c(2,5)) ra2ba(x)
Creates correlated multivariate binary random variables by thresholding a normal distribution. The correlations of the components can be specified either as common probabilities, correlation matrix of the binary distribution, or covariance matrix of the normal distribution.
rmvbin(n, margprob, commonprob=diag(margprob), bincorr=diag(length(margprob)), sigma=diag(length(margprob)), colnames=NULL, simulvals=NULL)rmvbin(n, margprob, commonprob=diag(margprob), bincorr=diag(length(margprob)), sigma=diag(length(margprob)), colnames=NULL, simulvals=NULL)
n |
number of observations. |
margprob |
margin probabilities that the components are 1. |
commonprob |
matrix of probabilities that components |
bincorr |
matrix of binary correlations. |
sigma |
covariance matrix for the normal distribution. |
colnames |
vector of column names for the resulting observation matrix. |
simulvals |
result from |
Only one of the arguments commonprob, bincorr and
sigma may be specified. Default are uncorrelated components.
n samples from a multivariate normal distribution with mean and
variance chosen in order to get the desired margin and common
probabilities are sampled. Negative values are converted to 0,
positive values to 1.
Friedrich Leisch
Leisch F, Weingessel A, Hornik K (1998). “On the Generation of Correlated Artificial Binary Data.” Working Paper 13, SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business. doi:10.57938/6884f809-93bc-4497-ab2b-fc1611198f5b.
commonprob2sigma,
check.commonprob,
simul.commonprob
## uncorrelated columns: rmvbin(10, margprob=c(0.3,0.9)) ## correlated columns m <- cbind(c(1/2,1/5,1/6),c(1/5,1/2,1/6),c(1/6,1/6,1/2)) rmvbin(10, commonprob=m) ## same as the second example, but faster if the same probabilities are ## used repeatedly (commonprob2sigma rather slow) sigma <- commonprob2sigma(m) rmvbin(10, margprob = diag(m), sigma = sigma) ## The default 'simulvals' may not work for very small marginal ## probabilities. E.g., for p1 <- 0.0206 p2 <- 0.0318 m <- matrix(c(1, 0.5, 0.5, 1), ncol = 2) ## rmvbin(1, margprob = c(p1,p2), bincorr = m) fails with 'Extrapolation ## occurred ... margprob and commonprob not compatible?' ## (reported by <[email protected]> in 2025-12). ## Using e.g. s <- simul.commonprob(margprob = c(p1, p2), corr = seq(-1, 1, by = 0.05)) ## makes things work: rmvbin(10, margprob = c(p1,p2), bincorr = m, simulvals = s)## uncorrelated columns: rmvbin(10, margprob=c(0.3,0.9)) ## correlated columns m <- cbind(c(1/2,1/5,1/6),c(1/5,1/2,1/6),c(1/6,1/6,1/2)) rmvbin(10, commonprob=m) ## same as the second example, but faster if the same probabilities are ## used repeatedly (commonprob2sigma rather slow) sigma <- commonprob2sigma(m) rmvbin(10, margprob = diag(m), sigma = sigma) ## The default 'simulvals' may not work for very small marginal ## probabilities. E.g., for p1 <- 0.0206 p2 <- 0.0318 m <- matrix(c(1, 0.5, 0.5, 1), ncol = 2) ## rmvbin(1, margprob = c(p1,p2), bincorr = m) fails with 'Extrapolation ## occurred ... margprob and commonprob not compatible?' ## (reported by <[email protected]> in 2025-12). ## Using e.g. s <- simul.commonprob(margprob = c(p1, p2), corr = seq(-1, 1, by = 0.05)) ## makes things work: rmvbin(10, margprob = c(p1,p2), bincorr = m, simulvals = s)
Compute common probabilities of binary random variates generated by thresholding normal variates at 0.
simul.commonprob(margprob, corr=0, method="integrate", n1=10^5, n2=10)simul.commonprob(margprob, corr=0, method="integrate", n1=10^5, n2=10)
margprob |
vector of marginal probabilities. |
corr |
vector of correlation values for normal distribution. |
method |
either |
n1 |
number of normal variates if method is |
n2 |
number of repetitions if method is |
The output of this function is used by rmvbin. For all
combinations of marginprob[i], marginprob[j] and
corr[k], the probability that both components of a normal
random variable with mean qnorm(marginprob[c(i,j)]) and
correlation corr[k] are larger than zero is computed.
The probabilities are either computed by numerical integration of the multivariate normal density, or by Monte Carlo simulation.
For normal usage of rmvbin it is not necessary to use
this function, one simulation result is provided as variable
SimulVals in this package and loaded by default.
simul.commonprob returns an array of dimension
c(length(margprob), length(margprob), length(corr)).
Friedrich Leisch
Leisch F, Weingessel A, Hornik K (1998). “On the Generation of Correlated Artificial Binary Data.” Working Paper 13, SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business. doi:10.57938/6884f809-93bc-4497-ab2b-fc1611198f5b.
simul.commonprob(seq(0,1,0.5), seq(-1,1,0.5), method="mo", n1=10^4) data(SimulVals)simul.commonprob(seq(0,1,0.5), seq(-1,1,0.5), method="mo", n1=10^4) data(SimulVals)
This variable provides a pre-fabricated result from
simul.commonprob such that it is normally not necessary
to use this (time consuming) function, and is used by
rmvbin.
SimulValsSimulVals
Friedrich Leisch
Leisch F, Weingessel A, Hornik K (1998). “On the Generation of Correlated Artificial Binary Data.” Working Paper 13, SFB Adaptive Information Systems and Modelling in Economics and Management Science, WU Vienna University of Economics and Business. doi:10.57938/6884f809-93bc-4497-ab2b-fc1611198f5b.