Practicality of Some Variations of Ranked Set Sampling

Judgement ranking in ranked set sampling (RSS) and its variations depends on the ability of an observer to rank a set of objects according to the study variable without doing any actual measurement. In practice, and in some variations of RSS, it is hard to assign these ranks. In this paper, we discuss the practicality of ranking some extensions of RSS such as median RSS, double median RSS, and double RSS. The Hellinger distance is used as a measure of practicality. Although double median RSS is the most efficient approach among the RSS variations considered, it is shown in this paper that it is the least practical.


Introduction
Ranked set sampling (RSS), proposed by McIntyre (1952), is a data collection or a sampling scheme. Due to its importance for a variety of applications in statistics, it is republished in McIntyre (2005). It is proposed to estimate the mean of Australian pasture yields. McIntyre (1952McIntyre ( , 2005 claimed that the RSS mean is an unbiased estimator of the population mean and the variance of the RSS mean is smaller than that in simple random sampling (SRS) with equal measurement elements. This sampling scheme is useful when it is difficult to measure large number of elements but visually (without inspection) ranking some of them is easier. The scheme involves randomly selecting m sets (each of size m elements) from the study population. The elements of each set are ordered with regards to the study variable by any negligible cost method or visually without measurements. Finally, the i th minimum from the i th set, i = 1, 2, . . . , m, are identified for measurement. The obtained sample is called a ranked set sample of set size m. Takahasi and Wakimoto (1968) provided the mathematical theory behind the claims of McIntyre (1952McIntyre ( , 2005. As claimed by McIntyre (1952McIntyre ( , 2005 it is later shown in the literature that estimators calculated based on RSS are more efficient than their counterpart in SRS. For example, Stokes and Sager (1988) showed that the empirical distribution function based on RSS is more efficient than its counterpart in SRS. Some authors estimate the parameters of a specific distribution using RSS, see for example Al-Saleh and Diab (2009) and Sarikavanij, Kasala, Sinha, and Tiensuwan (2014).
To better improve the efficiency of the estimators some variations of RSS were proposed. Al-Saleh and Al-Kadiri (2000) suggested double RSS (DRSS), as a method that improves efficiency of the RSS estimators while keeping m fixed. They reported that the RSS estimator is less efficient than when using DRSS. Muttlak (1997) proposed median RSS (MRSS) as a modification of RSS to improve the efficiency of the estimators of the population mean for symmetric distributions and of the population median. The procedure of MRSS is similar to RSS but in lieu of identifying the i th minimum from the i th set only the median of each set is identified. Given odd set size m, the m+1 2 th smallest element is identified from each set for measurement. When m is even, from the first m 2 sets the m 2 th smallest element is identified for measurement and from the second m 2 sets the m 2 + 1 th smallest element is identified for measurement. Samawi and Tawalbeh (2002) suggested a double MRSS (DMRSS) as an alternative procedure to improve the efficiency of the sample mean. They compared the DMRSS with SRS, RSS, DRSS, and some other sampling schemes and found that DMRSS is the most efficient scheme.
Recently, there have been work on multi-stage sampling. Amro and Samuh (2017)  Although recent works in the literature address the usefulness of MSRSS schemes, the current paper explores the comparison of RSS schemes up to double stage sampling only as the paper is an initial work on practicality of these schemes. Intuitive insights can be drawn into MSRSS schemes from this work but more elaborate studies will be needed to address this comparison for the MSRSS schemes and will use up more space. In the interest of conserving space, the paper is focused mainly on the RSS schemes up to double stage sampling.
In the process of DMRSS, the data points are identified based on the data points of MRSS. For example, if m is odd, the data points of the DMRSS are just the medians of the data points of MRSS; that is, the data points of DMRSS are the medians of the medians of the SRS. It is clear that identifying median of the medians is a hard process, and this contradict the nature of RSS schemes which require visual comparison without inspection (a rationale originally mentioned by McIntyre (1952)). In the process of DRSS, the data points are identified based on the data points of the RSS. For example, the first data point of DRSS is the minimum of the RSS data points, which is easy to be identified visually without inspection. Al-Saleh and Al-Kadiri (2000) have shown by the degree of distinguishability and the probability of perfect ranking that ranking an iid data points is harder than ranking ordered (but independent) data points. Thus, ranking observations in a DMRSS is harder than in a DRSS. In other words, DRSS is more practical than DMRSS. In this paper, since observations that are closer to each other are more difficult to rank, we suggest to use the Hellinger distance (defined in Eq. (4) later in this paper) as a measure of ranking practicality.
To our knowledge, practicality of RSS schemes have not been compared in the literature with regards to Hellinger distance. The rest of the paper is organized as follows. A general setup and some basic results are given in Sec. 2. Hellinger distance is defined and applied to RSS schemes in Sec. 3. Finally, Sec. 4 concludes the paper.

Some basic properties of the sampling schemes
Let X be a continuous random variable with cumulative distribution function (cdf) F (x), and probability density function (pdf) f (x).
Simple random sampling Let X 1 , X 2 , . . . , X m indicate a SRS from f (x), then X i are independent and identically distributed as f (x). Note that when f (x) is infinite, SRS and random sample are used synonymly.

Ranked set sampling Let
is the i th order statistic of the random sample X 1 , X 2 , . . . , X m , where the superscript (1) represents stage 1. The cdf of Y (1) m (which are independent but not identical random variables). Hence, the cdf of Y where S l is the set of the entire permutations (j 1 , j 2 , . . . , j m ), of the integers (1, 2, . . . , m) for which j 1 < j 2 < · · · < j l , and j l+1 < j l+2 < · · · < j m (David and Nagaraja (2003)). The pdf of Y (2) i is the derivative of F Y (2) i (y).

Median RSS Let W
(1) 2 , . . . , W (1) m be a MRSS; that is The pdf of W The cdf of W

Hellinger distance
Suppose Y and X are two random variables with density functions f Y (x) and f X (x), respectively. The Hellinger distance (See for example Nikulin (2001)) between Y and X is defined by Obviously, for independent and identical random variables, H(X, Y ) = 0. So the Hellinger distance between any two data points of the SRS X 1 , X 2 , . . . , X m is zero. Therefore, identifying the ordered data points (for getting either RSS or MRSS) based on the SRS is difficult. Now, given the data points of the RSS (Y 2 , . . . , Y (1) m ), then for k, l = 1, 2, . . . , m, Let F (y) = u and du = f (y)dy, then The results for particular values of m, k, and l are shown in the third column of Table 1. Note that the Hellinger distances in this case are not zeros; that is, the further work of identifying the ordered data points of DRSS (i.e., for stage 2) based on the RSS data points (stage 1) is easier now than using SRS data points. It is simple to verify that when |k − l| = 2,

Conclusion
For a single stage sampling, MRSS and RSS have the same practicality, and since it is shown in the literature that MRSS is more efficient than RSS we recommend to use MRSS. For a second stage sampling, although it is shown in the literature that DMRSS is more efficient than DRSS, we recommend to use DRSS because it is more practical than DMRSS. This will speed up the visual ranking process and reduce the ranking error, and therefore identify the data points quickly.