Monitoring the change in lower percentiles of the modulus of rupture and modulus of elasticity is important in the forest industry. Often, the lumber data are clustered and observations within the same cluster are potentially correlated. Such a correlation structure leads to increased false alarm rates shown in the eight statistical tests investigated by Verrill et al. (2015). In forestry or many other industries, such clustered data are often collected independently from multiple populations. To taking advantage of the similarity of their population distributions and avoid the risk of model specification, Chen et al. (2016) recommended the use of a density ratio model and analyzing the data via composite empirical likelihood. They further propose a cluster-based bootstrapping method to do away with the effect of the correlation structures in the data. The cluster-based bootstrap procedure permits easy construction of confidence intervals and carrying out tests for various hypotheses.
Chen et al. (2016) developed theory and carried out data experiments for clustered data from multiple populations of equal cluster size. It is foreseeable that their subsequent data analysis methods remain effective when the data contain clusters of unequal sizes, after some minor adjustments. In fact, if the majority of clusters are of the same cluster size, no changes seem necessary. When data contain various cluster sizes, the theory of Chen et al. (2016) must be re-examined and their methods may have different asymptotic properties. If their methods are applied with only minor changes, the confidence intervals may have too high or too low coverage probabilities, and the hypothesis tests may have inaccurate sizes.
In this project, we go over the methods and theory of Chen et al. (2016) when the data contain different cluster sizes. We conclude some changes are indeed necessary to ensure the asymptotic validity of these inference methods. We propose first to create a new cluster of nearly equal cluster sized from the original clustered data and analyze the new clustered data by the methods in Chen et al. (2016). We discuss the asymptotic properties of the new procedures. We establish the consistency of the new estimator and show that the maximum composite EL quantile estimator still has Bahadur representation. We use simulation to obtain the average mean square errors of the estimators and coverage probabilities of two-sided 95% confidence intervals. The simulation results show that the AMSE of the composite EL quantiles are less than those of the empirical quantiles, and the bootstrap confidence intervals close to nominal 95% only when we have large sample sizes.