INDUCING A TARGET ASSOCIATION BETWEEN ORDINAL VARIABLES BY USING A PARAMETRIC COPULA FAMILY

The need for building and generating statistically dependent random variables arises in various ﬁelds of study where simulation has proven to be a useful tool. In this work, we present an approach for constructing ordinal variables with arbitrary marginal distributions and association, expressed in terms of either Goodman and Kruskal’s gamma or Pearson’s linear correlation.


Introduction
The need for building and generating statistically dependent random variables arises in various fields of study where simulation has proven to be a useful tool. The ability to simulate data resembling observed data is fundamental to compare and investigate the behaviour of statistical procedures when analytical results are not derivable or are cumbersome to derive.
Many datasets, especially those arising in the social sciences, often contain ordinal variables. Sometimes they are genuine ordered assessments (judgements, preferences, degree of liking, etc.) whereas in other circumstances they are discretized or categorized for convenience (e.g., age of people in classes or education achievement). There are several statistical models and techniques that can be employed for handling multivariate ordinal data without trying to quantify their ordered categories: [1] gives a thorough treatment. Among them, correlation models and association models both study departures from independence in contingency tables and involve the assignment of scores to the categories of the row and column variables in order to maximize the relevant measure of relationship (the correlation coefficient in the correlation models or the measure of intrinsic association in association models [5]). Alternatively, one can code the ordered categories as integers numbers (1, 2 . . . , m): This amounts to assuming that the categories are evenly spaced.
In this work, we present an approach for constructing ordinal variables with arbitrary marginal distributions and association, expressed in terms of either Goodman and Kruskal's gamma or Pearson's linear correlation. Similar proposals have been already suggested by [7], when dealing with ordinal variables and Goodman and Kruskal's gamma, and by [2,8,4] for ordinal (and count) variables and Pearson's correlation.

Statement of the problem
We consider two ordinal random variables (rvs), X and Y , with h and k ordered categories, respectively, with marginal distributions p i· = P (X = x i ), i = 1, . . . , h, and p ·j = P (Y = x j ), j = 1, . . . , k. We want to determine some joint probability distribution p ij = P (X = x i , Y = y j ), i = 1, . . . , h, j = 1, . . . , k, such that its margins are actually p i· and p ·j , and with an assigned level of association.
Being X and Y ordinal variables, the association can be naturally expressed through the Goodman and Kruskal's gamma [6]. Considering two independent realizations (X s , Y s ) and (X t , Y t ) of (X, Y ), Goodman and Kruskal's gamma is defined as where Π c is the probability of concordance: and Π d the probability of discordance: Π c and Π d can be expressed in terms of the joint probabilities p ij . γ take values in the [−1, +1] interval; in particular, the values −1, 0, and +1 are attained when Π c = 0, If we treat X and Y as point-scale discrete variables, by assigning the first h and k positive integers, respectively, to their ordered categories, then we can use Pearson's correlation coefficient as a measure of association: Like γ, also Pearson's correlation takes values in the [−1, +1] interval; however, given two marginal distributions and a value ρ ∈ [−1, +1], it is not always possible to construct a joint distribution with those assigned margins, whose correlation is equal to the assigned ρ [9]. In more detail, the attainable correlations form a closed interval [ρ min , ρ max ] with ρ min < 0 < ρ max . The minimum correlation ρ = ρ min is attained if and only if X and Y are countermonotonic; the maximum correlation ρ = ρ max is attained if and only if X and Y are comonotonic. Moreover, ρ min = −1 if and only if X 1 and −X 2 are of the same type, and ρ max = 1 if and only if X 1 and X 2 are of the same type. Given the two margins, a correlation ρ is said "feasible" if it falls within [ρ min , ρ max ].

A solution to the problem employing copulas
Finding a joint probability distribution with assigned margins and a desired (feasible) value of association is equivalent to solving a system in h × k unknowns, the p ij , belonging to the standard simplex, subject to h + k − 1 constraints corresponding to the assigned margins and one further constraint dictated by the desired association. This system, when h or k is greater than 2, has infinite solutions, which can be recovered more easily when using Pearson's correlation as a measure of association, being it a linear function in the p ij .
Here we propose an approach to identify just one solution, i.e., one joint distribution. This procedure relies on one-parameter bivariate copulas, which allow to split the original problem into two sequential steps: first, identifying a class of joint distributions respecting the assigned margins; then, within this class, finding the joint distribution matching the desired level of association.

Selecting a class of joint distributions having the prespecified margins
As for the first step, if F 1 and F 2 are the distribution functions of two rvs X and Y , and C(u, v; θ) is a bivariate parametric copula family, characterized by some scalar parameter θ, the function defines a valid joint distribution function, whose margins are exactly F 1 and F 2 . This result keeps holding if X and Y are discrete; in this case, the joint probabilities can be derived from (1) as: for i = 1, . . . , h; j = 1, . . . , k. In order to induce any feasible value of association between the two discrete margins, we have further to impose that the copula C(u, v; θ) is able to encompass the entire range of dependence, from perfect negative dependence to perfect positive dependence.

Inducing the desired value of association
As for the second step, the association between X and Y now depends only on the copula parameter θ; this relationship may be written in an analytical or numerical form, say γ = f (θ), or ρ = g(θ) . Since the function f (or g) is not usually analytically invertible, inducing a desired feasible value of association, by setting an appropriate value of θ, is a task that can be generally done only numerically, by finding the (unique) root of the equation f (θ) − γ = 0 (or g(θ) − ρ = 0). If γ (or ρ) is a monotone increasing function of the copula parameter, and this is often the case (e.g., for the Gauss, Frank, and Plackett copulas), one can implement some iterative procedure that is more efficient than the standard bisection method. For discrete random variables, several proposals have been suggested for matching a desired value of Pearson's correlation, see [2,8,4]. Simulating from the selected joint distribution is straightforward, by resorting to preliminary simulation of copulas or more easily to a direct inversion algorithm [3,7].