Discussion: View Thread

Expand all | Collapse all

Suggestions for zero spike/ left- skewed independent variable

1. Suggestions for zero spike/ left- skewed independent variable

Like
Archive User
Posted 10-29-2015 09:48
Dear colleagues,
I am facing an issue during some analysis and wonder if some of you have already faced it before and could suggest some references.
In a model that I want to test, one of the independent variables is a continuous variable, with positive values, but with high left skewness and a zero spike. This variable expresses the extend of offer of some benefits, but many of my respondents have answered that they do not receive them, so 30-70% of the answers are zero values, depending on the benefit studied. I feel that in this case I have, in one variable, two types of information that should not be treated in the same way: one yes-no dichotomous variable and one continuous variable (extent of use). In fact, this case has been treated in other disciplines with the introduction of the continuous variable along with a dummy variable, in order to grasp both sources of variance (quantitative and qualitative). When I attempted that in my data, though, I got issues with multicollinearity. I have also tried standardizing the variable, but am not sure whether this is acceptable.
Any suggestions and/ or references would be greatly appreciated!
Eleanna

P.S. the reference for the simultaneous use of the dummy + continuous is from epidemiology:
Leffondré, K., Abrahamowicz, M., Siemiatycki, J., & Rachet, B. (2002). Modeling Smoking History: A Comparison of Different Approaches. American Journal of Epidemiology, 156(9), 813-823. doi: 10.1093/aje/kwf122

--

Dr. Eleanna Galanaki,

Assistant Professor in Organizational Behavior,

Athens University of Economics and Business,

School of Business, Department of Marketing and Communication

tel./fax: +30 210 8203473

url: http://www.aueb.gr/pages_en/faculty/faculty_en_short.php?facid=1311

Electronic copies of my papers are available from the SSRN eLibrary at: http://ssrn.com/author=567485
2. Suggestions for zero spike/ left- skewed independent variable

Like
Archive User
Posted 10-29-2015 10:12
Dear,

One way to approach this data might be by modelling a zero inflated poisson model. The zero inflated part would than reflect your dummy variable and the poisson part would reflect the extent of use. This approach seems to fit the data you have. You want want to look at Lambert (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing, Technometrics, 34(1), 1-14.

These kind of models can be estimated in Mplus by using the COUNT IS VARNAME(i); statement before introducing your analysis and model. In the model you can estimate the dummy part by including a #1 after the name of the variable (VARNAME#1).

Hope this helps.

Kind regards

Yannick Griep

PhD researcher
Faculty of Psychology and Educational Sciences
Work and Organizational Psychology
Vrije Universiteit Brussel

Download Yannick's full papers on ResearchGate.

And check out the Bi-Annual Psychological Contract Small Group Conference Dublin Ireland, 13th & 14th July 2016

yannick.griep@vub.ac.be
Pleinlaan 2, B- 1050 Brussels, Belgium
Tel. + 32 472 71 98 78
Website: www.vub.ac.be/yannick.griep

On 29 Oct 2015, at 14:47, Eleanna Galanaki <eleanag@AUEB.GR> wrote:

Dear colleagues,
I am facing an issue during some analysis and wonder if some of you have already faced it before and could suggest some references.
In a model that I want to test, one of the independent variables is a continuous variable, with positive values, but with high left skewness and a zero spike. This variable expresses the extend of offer of some benefits, but many of my respondents have answered that they do not receive them, so 30-70% of the answers are zero values, depending on the benefit studied. I feel that in this case I have, in one variable, two types of information that should not be treated in the same way: one yes-no dichotomous variable and one continuous variable (extent of use). In fact, this case has been treated in other disciplines with the introduction of the continuous variable along with a dummy variable, in order to grasp both sources of variance (quantitative and qualitative). When I attempted that in my data, though, I got issues with multicollinearity. I have also tried standardizing the variable, but am not sure whether this is acceptable.
Any suggestions and/ or references would be greatly appreciated!
Eleanna

P.S. the reference for the simultaneous use of the dummy + continuous is from epidemiology:
Leffondré, K., Abrahamowicz, M., Siemiatycki, J., & Rachet, B. (2002). Modeling Smoking History: A Comparison of Different Approaches. American Journal of Epidemiology, 156(9), 813-823. doi: 10.1093/aje/kwf122

--
Dr. Eleanna Galanaki,
Assistant Professor in Organizational Behavior,
Athens University of Economics and Business,
School of Business, Department of Marketing and Communication
tel./fax: +30 210 8203473
url: http://www.aueb.gr/pages_en/faculty/faculty_en_short.php?facid=1311
Electronic copies of my papers are available from the SSRN eLibrary at: http://ssrn.com/author=567485
3. Suggestions for zero spike/ left- skewed independent variable

Like
Archive User
Posted 10-29-2015 13:31
Hi Eleanna,

Since it is the independent variable rather than the dependent variable that is highly skewed, it may not matter as long as the regression residual is normally distributed. But it may suggest a non-linear relationship so you may want to test that.

If you use STATA, there is an on-line thread of discussion about this issue http://www.stata.com/statalist/archive/2010-03/msg01034.html

Hope this helps!

Ping

Pingshu Li
PhD Candidate | Organizational Behavior & Human Resources
University of Kansas | School of Business
pingshu.li@ku.edu |(785) 864-7508

From: Organizational Behavior Division Listserv [mailto:OB@AOMLISTS.PACE.EDU] On Behalf Of Eleanna Galanaki
Sent: Thursday, October 29, 2015 8:48 AM
To: OB@AOMLISTS.PACE.EDU
Subject: [OB-LIST] Suggestions for zero spike/ left- skewed independent variable

Dear colleagues,
I am facing an issue during some analysis and wonder if some of you have already faced it before and could suggest some references.
In a model that I want to test, one of the independent variables is a continuous variable, with positive values, but with high left skewness and a zero spike. This variable expresses the extend of offer of some benefits, but many of my respondents have answered that they do not receive them, so 30-70% of the answers are zero values, depending on the benefit studied. I feel that in this case I have, in one variable, two types of information that should not be treated in the same way: one yes-no dichotomous variable and one continuous variable (extent of use). In fact, this case has been treated in other disciplines with the introduction of the continuous variable along with a dummy variable, in order to grasp both sources of variance (quantitative and qualitative). When I attempted that in my data, though, I got issues with multicollinearity. I have also tried standardizing the variable, but am not sure whether this is acceptable.
Any suggestions and/ or references would be greatly appreciated!
Eleanna

P.S. the reference for the simultaneous use of the dummy + continuous is from epidemiology:
Leffondré, K., Abrahamowicz, M., Siemiatycki, J., & Rachet, B. (2002). Modeling Smoking History: A Comparison of Different Approaches. American Journal of Epidemiology, 156(9), 813-823. doi: 10.1093/aje/kwf122

--

Dr. Eleanna Galanaki,

Assistant Professor in Organizational Behavior,

Athens University of Economics and Business,

School of Business, Department of Marketing and Communication

tel./fax: +30 210 8203473

url: http://www.aueb.gr/pages_en/faculty/faculty_en_short.php?facid=1311

Electronic copies of my papers are available from the SSRN eLibrary at: http://ssrn.com/author=567485
4. Suggestions for zero spike/ left- skewed independent variable

Like
Archive User
Posted 10-29-2015 16:39
Eleanna,

You may want to try running a zero-inflated negative binomial regression or a poisson regression.

Best of luck,

Dr. Angus Duff

Assistant Professor, Human Resource Management

School of Business and Economics

Thompson Rivers University

International Building, Office 2018

900 McGill Road, Kamloops, BC V2C 0C8

Tel: 250.371.5903

aduff@tru.ca

www.tru.ca/business

From: Organizational Behavior Division Listserv [mailto:OB@AOMLISTS.PACE.EDU] On Behalf Of Li, Ping Shu
Sent: Thursday, October 29, 2015 10:31 AM
To: OB@AOMLISTS.PACE.EDU
Subject: Re: [OB-LIST] Suggestions for zero spike/ left- skewed independent variable

Hi Eleanna,

Since it is the independent variable rather than the dependent variable that is highly skewed, it may not matter as long as the regression residual is normally distributed. But it may suggest a non-linear relationship so you may want to test that.

If you use STATA, there is an on-line thread of discussion about this issue http://www.stata.com/statalist/archive/2010-03/msg01034.html

Hope this helps!

Ping

Pingshu Li
PhD Candidate | Organizational Behavior & Human Resources
University of Kansas | School of Business
pingshu.li@ku.edu |(785) 864-7508

From: Organizational Behavior Division Listserv [mailto:OB@AOMLISTS.PACE.EDU] On Behalf Of Eleanna Galanaki
Sent: Thursday, October 29, 2015 8:48 AM
To: OB@AOMLISTS.PACE.EDU
Subject: [OB-LIST] Suggestions for zero spike/ left- skewed independent variable

Dear colleagues,
I am facing an issue during some analysis and wonder if some of you have already faced it before and could suggest some references.
In a model that I want to test, one of the independent variables is a continuous variable, with positive values, but with high left skewness and a zero spike. This variable expresses the extend of offer of some benefits, but many of my respondents have answered that they do not receive them, so 30-70% of the answers are zero values, depending on the benefit studied. I feel that in this case I have, in one variable, two types of information that should not be treated in the same way: one yes-no dichotomous variable and one continuous variable (extent of use). In fact, this case has been treated in other disciplines with the introduction of the continuous variable along with a dummy variable, in order to grasp both sources of variance (quantitative and qualitative). When I attempted that in my data, though, I got issues with multicollinearity. I have also tried standardizing the variable, but am not sure whether this is acceptable.
Any suggestions and/ or references would be greatly appreciated!
Eleanna

P.S. the reference for the simultaneous use of the dummy + continuous is from epidemiology:
Leffondré, K., Abrahamowicz, M., Siemiatycki, J., & Rachet, B. (2002). Modeling Smoking History: A Comparison of Different Approaches. American Journal of Epidemiology, 156(9), 813-823. doi: 10.1093/aje/kwf122

--

Dr. Eleanna Galanaki,

Assistant Professor in Organizational Behavior,

Athens University of Economics and Business,

School of Business, Department of Marketing and Communication

tel./fax: +30 210 8203473

url: http://www.aueb.gr/pages_en/faculty/faculty_en_short.php?facid=1311

Electronic copies of my papers are available from the SSRN eLibrary at: http://ssrn.com/author=567485
5. Suggestions for zero spike/ left- skewed independent variable

Like
Archive User
Posted 10-30-2015 04:51
Hi Eleanna,

in addition to the excellent previous suggestions, I would recommend the following papers:

Atkins, D. C., & Gallop, R. J. (2007). Rethinking how family researchers model infrequent outcomes: a tutorial on count regression and zero-inflated models. Journal of Family Psychology, 21(4), 726.
Blevins, D. P., Tsang, E. W., & Spain, S. M. (2015). Count-Based Research in Management Suggestions for Improvement. Organizational Research Methods, 18(1), 47-69.
Coxe, S., West, S. G., & Aiken, L. S. (2009). The analysis of count data: A gentle introduction to Poisson regression and its alternatives. Journal of Personality Assessment, 91(2), 121-136.

The Atkins & Gallop paper gives a good overview using a real example.
The Blevins et al. paper has a nice decision tree.
The Coxe et al. paper is more technical, but very thorough, and includes SPSS syntax.

Best,
Heiko

Discussion: View Thread

Suggestions for zero spike/ left- skewed independent variable

Archive User10-29-2015 09:48

Archive User10-29-2015 10:12

Archive User10-29-2015 13:31

Archive User10-29-2015 16:39

Archive User10-30-2015 04:51

1. Suggestions for zero spike/ left- skewed independent variable

2. Suggestions for zero spike/ left- skewed independent variable

3. Suggestions for zero spike/ left- skewed independent variable

4. Suggestions for zero spike/ left- skewed independent variable

5. Suggestions for zero spike/ left- skewed independent variable

Follow OB on Social Media