Abstract
Purpose
The purpose of this study is to extend the classical noncentral F-distribution under normal settings to noncentral closed skew F-distribution for dealing with independent samples from multivariate skew normal (SN) distributions.
Design/methodology/approach
Based on generalized Hotelling's T2 statistics, confidence regions are constructed for the difference between location parameters in two independent multivariate SN distributions. Simulation studies show that the confidence regions based on the closed SN model outperform the classical multivariate normal model if the vectors of skewness parameters are not zero. A real data analysis is given for illustrating the effectiveness of our proposed methods.
Findings
This study’s approach is the first one in literature for the inferences in difference of location parameters under multivariate SN settings. Real data analysis shows the preference of this new approach than the classical method.
Research limitations/implications
For the real data applications, the authors need to remove outliers first before applying this approach.
Practical implications
This study’s approach may apply many multivariate skewed data using SN fittings instead of classical normal fittings.
Originality/value
This paper is the research paper and the authors’ new approach has many applications for analyzing the multivariate skewed data.
Keywords
Citation
Ma, Z., Wang, T., Wei, Z. and Zhu, X. (2022), "Inferences on location parameters based on independent multivariate skew normal distributions", Asian Journal of Economics and Banking, Vol. 6 No. 2, pp. 270-281. https://doi.org/10.1108/AJEB-03-2022-0034
Publisher
:Emerald Publishing Limited
Copyright © 2022, Ziwei Ma, Tonghui Wang, Zheng Wei and Xiaonan Zhu
License
Published in Asian Journal of Economics and Banking. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode
1. Introduction
Although the normal distribution is a standard assumption for modeling observations in general, practitioners and researchers prefer more flexible models that account for the non-normality when the data collected in finance and econometric fields. The family of skew normal (SN) distributions, introduced by Azzalini (1985) for the univariate case, Azzalini and Valle (1996) for the multivariate case and Chen and Gupta (2005) for the matrix variate case, becomes a popular parametric family in statistical analysis of real data which account for asymmetry. There are several successful applications using SN, like modeling skewness premium of a financial asset by Carmichael and Coën (2013), addressing “wrong skewness” problems in stochastic frontier models by Wei et al. (2021). Here just list a few, an updated review was given by Adcock and Azzalini (2020).
Based on the definition given in Arellano-Valle et al. (2005), a p-dimensional random vector Y is said to be SN distributed with the location parameter vector
For the univariate SN family, constructing plausibility regions for skewness parameter was discussed by Zhu et al. (2017) using inferential models (IMs). The joint plausibility regions for location parameter and skewness parameter were studied by Ma et al. (2018) when scale parameter is known using IMs, and the joint plausibility regions for location parameter and scale parameter were constructed by Zhu et al. (2018) when skewness parameter is given. For multivariate SN model, the confidence regions for location parameter are obtained by Ma et al. (2019). In this work, we study the difference of location parameters based on independent multivariate SN distributions so that the generalized Hotelling's T2, and noncentral closed skew F-distributions are used. Under the assumption of equal but unknown scale parameters, the confidence regions for differences of location parameters of the multivariate SN model are proposed. Simulation studies show that the proposed confidence regions have higher relative coverage frequency rates than those in classical normal model for skewed data.
The organization of this paper is listed below. In Section 2, the definition of matrix variate SN distribution is introduced and some useful properties of sampling distribution on difference of sample means are derived. In Section 3, the confidence regions on the difference of location parameters by pivotal method are proposed when scale parameters from two populations are assumed to be equal but unknown. A group of simulation studies, which illustrate the effectiveness of our proposed methods, are given in Section 4, followed by a real data example in Section 5. The conclusion is given in Section 6.
2. Matrix variate SN distributions and sampling distributions
Let Mn×k be the set of all n × k matrices over the real field
Ye et al. (2014)The n × p random matrix Y is said to have a SN matrix distribution with location matrix M, scale matrix V ⊗Σ and skewness parameter matrix γ ⊗λ′, denoted by
Suppose that
Ma et al. (2019)Let
By Lemma 2.1, we have
The difference between two independent SN distributed random vectors follows a closed SN distribution, which is reviewed below.
(Gonz
(Gonz
For an arbitrary constant
,
For nonzero real number
,
Let
, for i = 1, 2, be independently distributed. Then,
In term of CSN,
Let
By part (2) and (3) of Lemma 2.2, the desired result follows immediately. □
If λ2 = 0, i.e. X2 following multivariate normal distribution with mean μ2 and covariance
Figure 1 presents the contour of bivariate closed SN for various combinations of shape parameter parameters D with different scale parameter
3. Inference on difference of location parameters
In this section, the inference on the difference of location parameter is proposed when the scale parameter Σ1 and Σ2 are unknown but assumed to be equal, say Σ1 = Σ2 = Σ. The main result is based on the generalized Hotelling's T2 under multivariate SN setting.
3.1 Some related distributions
At first, we consider the distribution of
Zhu et al. (2019)Let X ∼ CSNp,q(μ, Ip, D, Δq). The distribution of X′X, denoted by
Zhu et al. (2019)Let X ∼ CSNp,q(μ, Σ, D, Δ) and Q = X′WX with a nonnegative definite W ∈ Mp×p. If Σ1/2WΣ1/2 is idempotent of rank k, then
Based on Theorem 2.1 and Lemma 3.1, we obtain the following result.
Let
From part (i) of Lemma 2.2, we have
Comparing with one sample case, the distribution of quantity
3.2 Confidence region of μd
In this subsection, we will extend the Hotelling's T2 statistic from multivariate normal setting to the multivariate SN setting, called the generalized Hotelling's
Let
By Lemma 2.1, (ni − 1)Si ∼ Wp(ni − 1, Σ) for i = 1, 2 are independently distributed. Thus, the well-known properties of Wishart distribution for sums and scale transformation lead to the desired result. □
To obtain the distribution of T2, we need the following well-known result (Lemma 3.2, Mardia et al. (1980), Theorem 3.4.7) and extended version of the F-distribution, called closed skew F-distribution, Definition 3.2, which was introduced by Zhu et al. (2019).
If
Zhu et al. (2019)Let
Based on above definition, the pdf of noncentral closed skew F-distribution can be obtained below.
Let
Let
By Lemma 3.2 and Definition 3.2, we obtain the distribution of T2 as follows.
For two independently distributed random matrices
Rewrite T2 as
Since Sp and
Based on above results, we construct confidence regions for the difference of location parameter μd by using generalized Hotelling's T2 as a pivotal statistics.
Assume two samples satisfying (2) with unknown Σ1 and Σ2 but Σ1 = Σ2 and known λ1 and λ2. Then, the
The following plots present the pdf of noncentral closed skew F-distribution (see Figure 2).
4. Simulation study
Simulations are conducted for evaluating the performance of the proposed confidence regions for μd under independent multivariate SN settings using the coverage relative frequency rates. Comparisons of proposed confidence regions with those in classical independent multivariate normal distributions are given.
4.1 Coverage frequencies
To evaluate the proposed confidence regions for difference of location parameters under multivariate SN setting, Monte Carlo simulation studies (each with a number of simulation runs M = 10,000) are conducted for combinations of various values of sample sizes (n1, n2) = (20,25), (40, 50) and (80,100), (ρ1, ρ2) = ({0.1, 0.5, 0.8}, {0.1, 0.5, 0.8}),
Table 1 shows that our method provides reliable inference about the difference of location parameters with nominal confidence level (95%). To further illustrate the effectiveness of the proposed method, we confidence intervals of the coverage probability presented in the following plot.
From Figure 3, we can see clearly that the pivotal quantity-based closed skew F-distribution produce more robust confidence region than that based on F-distribution. The coverage relative frequencies, based on the SN model, are close to the nominal confidence level 95% consistently for the combination of different sample sizes, scale parameter and skewness parameters. But the coverage relative frequencies, based on the normal model, are lower than the nominal confidence level.
5. Real data example
In this section, we illustrate the effectiveness and applicability of the proposed methods by applying them to Australian Institute of Sport (AIS) data (Cook and Weisberg, 2009). We explore the difference of body mass index (BMI) and lean body mass (LMB) between males and females athletes in AIS data.
The point estimates of the parameters for AIS data are reported in Table 2.
In Figure 4, the scatter plots and contour plots of fitted bivariate SN distributions are presented. Based on our previous work (Azzalini and Valle, 1996), this data set prefers multivariate SN model. So we adopt multivariate SN model as well to explore the difference of location parameters. Using point estimates listed in Table 2 and applying Theorem 2.1, the differences of sample mean has closed SN distribution,
Then, we apply Theorem 3.2 to construct the confidence region of difference of location parameters μd. In Figure 5, the confidence regions for the difference of location parameters are given below at 95% confidence level.
6. Conclusion
In this work, the difference for location parameters between two independent samples under multivariate SN setting is studied. The construction of confidence region procedure is developed. From the results of simulation studies, the confidence region based on SN model has better performance than normal model in term of relative coverage frequencies to capture the true value when the data are generated from skewed distribution.
Figures

Figure 3
The confidence intervals of coverage relative frequencies at confidence level α = 0.95, red ones based on SN model and blue ones based on normal model with sample size (n1, n2) = (20, 25), (40, 50), (80, 100) and (200, 250) (from left to right in each figure), ρ = 0.2, 0.5 and 0.8 (in each row), and D1 and D2 (from left to right), respectively
Coverage relative frequencies of confidence regions at confidence level α = 95% for difference of location parameters μd with various combinations of sample sizes, ρ1, ρ2 and D1, D2 using Hotelling's T2 as the pivotal quantity when Σ1 and Σ2 are equal but unknown
F-distribution | CSF-distribution | F-distribution | CSF-distribution | ||
---|---|---|---|---|---|
ρ = 0.2 | 0.9268 | 0.9495 | 0.9224 | 0.9524 | |
ρ = 0.5 | 0.9274 | 0.9511 | 0.9135 | 0.9502 | |
ρ = 0.8 | 0.9206 | 0.9504 | 0.9227 | 0.9520 | |
ρ = 0.2 | 0.9025 | 0.9479 | 0.9008 | 0.9534 | |
ρ = 0.5 | 0.9118 | 0.9451 | 0.9095 | 0.9509 | |
ρ = 0.8 | 0.9036 | 0.9521 | 0.9016 | 0.9524 | |
ρ = 0.2 | 0.8932 | 0.9527 | 0.9080 | 0.9479 | |
ρ = 0.5 | 0.8917 | 0.9463 | 0.8702 | 0.9470 | |
ρ = 0.8 | 0.9035 | 0.9483 | 0.8933 | 0.9448 | |
ρ = 0.2 | 0.8824 | 0.9503 | 0.8883 | 0.9461 | |
ρ = 0.5 | 0.88920 | 0.9533 | 0.8956 | 0.9559 | |
ρ = 0.8 | 0.8998 | 0.9492 | 0.8852 | 0.9546 |
Point estimates of SN parameters for the males and females AIS data, respectively
Males | Females | |
---|---|---|
References
Adcock, C. and Azzalini, A. (2020), “A selective overview of skew-elliptical and related distributions and of their applications”, Symmetry, Vol. 12 No. 1, p. 118.
Arellano-Valle, R., Bolfarine, H. and Lachos, V. (2005), “Skew-normal linear mixed models”, Journal of Data Science, Vol. 3 No. 4, pp. 415-438.
Azzalini, A. (1985), “A class of distributions which included the normal ones”, Scandinavian Journal of Statistics, Vol. 12 No. 2, pp. 171-178.
Azzalini, A. and Capitanio, A. (1999), “Statistical applications of the multivariate skew normal distribution”, Journal of the Royal Statistical Society. Series B (Statistical Methodology), Vol. 61 No. 3, pp. 579-602.
Azzalini, A. and Valle, A.D. (1996), “The multivariate skew-normal distribution”, Biometrika, Vol. 83 No. 4, pp. 715-726.
Carmichael, B. and Coën, A. (2013), “Asset pricing with skewed-normal return”, Finance Research Letters, Vol. 10 No. 2, pp. 50-57.
Chen, J.T. and Gupta, A.K. (2005), “Matrix variate skew normal distributions”, Statistics, Vol. 39 No. 3, pp. 247-253.
Cook, R.D. and Weisberg, S. (2009), An Introduction to Regression Graphics, John Wiley & Sons, New York, Vol. 405.
Gonzalez-Farias, G., Dominguez-Molina, A. and Gupta, A.K. (2004), “Additive properties of skew normal random vectors”, Journal of Statistical Planning and Inference, Vol. 126 No. 2, pp. 521-534.
Li, B., Tian, W. and Wang, T. (2018), “Remarks for the singular multivariate skew-normal distribution and its quadratic forms”, Statistics and Probability Letters, Vol. 137, pp. 105-112.
Ma, Z., Chen, Y.-J., Wang, T. and Peng, W. (2019), “The inference on the location parameters under multivariate skew normal settings”, in Kreinovich, V., Trung, N. and Thach, N. (Eds), Beyond Traditional Probabilistic Methods in Economics, Springer Nature, pp. 146-162.
Ma, Z., Zhu, X., Wang, T. and Autchariyapanitkul, K. (2018), “Joint plausibility regions for parameters of skew normal family”, in Krennovich, V., Sriboonchitta, S. and Chakpitak, N. (Eds), Predictive Econometrics and Big Data, Springer-Verlag, New York, pp. 233-245.
Mardia, K.V., Kent, J.T. and Bibby, J.M. (1980), Multivariate Analysis (Probability and Mathematical Statistics).
Wang, T., Li, B. and Gupta, A.K. (2009), “Distribution of quadratic forms under skew normal settings”, Journal of Multivariate Analysis, Vol. 100 No. 3, pp. 533-545.
Wei, Z., Zhu, X. and Wang, T. (2021), “The extended skew-normal-based stochastic frontier model with a solution to ‘wrong skewness’ problem”, Statistics, pp. 1-20, doi: 10.1080/02331888.2021.2004142.
Ye, R., Wang, T. and Gupta, A.K. (2014), “Distribution of matrix quadratic forms under skew-normal settings”, Journal of Multivariate Analysis, Vol. 131, pp. 229-239, 00010.
Young, P.D., Harvill, J.L. and Young, D.M. (2016), “A derivation of the multivariate singular skew-normal density function”, Statistics and Probability Letters, Vol. 117, pp. 40-45.
Zhu, X., Li, B., Wu, M. and Wang, T. (2018), “Plausibility regions on parameters of the skew normal distribution based on inferential models”, in Krennovich, V., Sriboonchitta, S. and Chakpitak, N. (Eds), Predictive Econometrics and Big Data, Springer-Verlag, New York, pp. 287-302.
Zhu, X., Li, B., Wang, T. and Gupta, A.K. (2019), “Sampling distributions of skew normal populations associated with closed skew normal distributions”, Random Operators and Stochastic Equations, Vol. 27 No. 2, pp. 75-87.
Zhu, X., Ma, Z., Wang, T. and Teetranont, T. (2017), “Plausibility regions on the skewness parameter of skew normal distributions based on inferential models”, in Krennovich, V., Sriboonchitta, S. and Huynh, V. (Eds), Robustness in Econometrics, Springer-Verlag, New York, pp. 267-286.
Acknowledgements
The authors would like to thank reviewers for their valuable suggestions and comments, which improved the manuscript.