Discussion: View Thread

  • 1.  Suggestions for zero spike/ left- skewed independent variable

    Posted 10-29-2015 09:48
    Dear colleagues,
    I am facing an issue during some analysis and wonder if some of you have already faced it before and could suggest some references.
    In a model that I want to test, one of the independent variables is a continuous variable, with positive values, but with high left skewness and a zero spike. This variable expresses the extend of offer of some benefits, but many of my respondents have answered that they do not receive them, so 30-70% of the answers are zero values, depending on the benefit studied. I feel that in this case I have, in one variable, two types of information that should not be treated in the same way: one yes-no dichotomous variable and one continuous variable (extent of use). In fact, this case has been treated in other disciplines with the introduction of the continuous variable along with a dummy variable, in order to grasp both sources of variance (quantitative and qualitative). When I attempted that in my data, though, I got issues with multicollinearity. I have also tried standardizing the variable, but am not sure whether this is acceptable.
    Any suggestions and/ or references would be greatly appreciated!
    Eleanna

    P.S. the reference for the simultaneous use of the dummy + continuous is from epidemiology:
    Leffondré, K., Abrahamowicz, M., Siemiatycki, J., & Rachet, B. (2002). Modeling Smoking History: A Comparison of Different Approaches. American Journal of Epidemiology, 156(9), 813-823. doi: 10.1093/aje/kwf122


    --
    Dr. Eleanna Galanaki,
    Assistant Professor in Organizational Behavior,
    Athens University of Economics and Business,
    School of Business, Department of Marketing and Communication
    tel./fax: +30 210 8203473
    Electronic copies of my papers are available from the SSRN eLibrary at: http://ssrn.com/author=567485


  • 2.  Suggestions for zero spike/ left- skewed independent variable

    Posted 10-29-2015 10:12
    Dear, 

    One way to approach this data might be by modelling a zero inflated poisson model. The zero inflated part would than reflect your dummy variable and the poisson part would reflect the extent of use. This approach seems to fit the data you have. You want want to look at Lambert (1992). Zero-Inflated Poisson Regression, with an Application to Defects in Manufacturing, Technometrics, 34(1), 1-14. 

    These kind of models can be estimated in Mplus by using the COUNT IS VARNAME(i); statement before introducing your analysis and model. In the model you can estimate the dummy part by including a #1 after the name of the variable (VARNAME#1). 

    Hope this helps. 

    Kind regards


    Yannick Griep

    PhD researcher 
    Faculty of Psychology and Educational Sciences
    Work and Organizational Psychology
    Vrije Universiteit Brussel

    Download Yannick's full papers on ResearchGate.

    And check out the 
    Bi-Annual Psychological Contract Small Group Conference  Dublin Ireland, 13th & 14th July 2016



    Pleinlaan 2, B- 1050 Brussels, Belgium

    On 29 Oct 2015, at 14:47, Eleanna Galanaki <eleanag@AUEB.GR> wrote:

    Dear colleagues,
    I am facing an issue during some analysis and wonder if some of you have already faced it before and could suggest some references.
    In a model that I want to test, one of the independent variables is a continuous variable, with positive values, but with high left skewness and a zero spike. This variable expresses the extend of offer of some benefits, but many of my respondents have answered that they do not receive them, so 30-70% of the answers are zero values, depending on the benefit studied. I feel that in this case I have, in one variable, two types of information that should not be treated in the same way: one yes-no dichotomous variable and one continuous variable (extent of use). In fact, this case has been treated in other disciplines with the introduction of the continuous variable along with a dummy variable, in order to grasp both sources of variance (quantitative and qualitative). When I attempted that in my data, though, I got issues with multicollinearity. I have also tried standardizing the variable, but am not sure whether this is acceptable.
    Any suggestions and/ or references would be greatly appreciated!
    Eleanna

    P.S. the reference for the simultaneous use of the dummy + continuous is from epidemiology: 
    Leffondré, K., Abrahamowicz, M., Siemiatycki, J., & Rachet, B. (2002). Modeling Smoking History: A Comparison of Different Approaches. American Journal of Epidemiology, 156(9), 813-823. doi: 10.1093/aje/kwf122


    -- 
    Dr. Eleanna Galanaki,
    Assistant Professor in Organizational Behavior,
    Athens University of Economics and Business,
    School of Business, Department of Marketing and Communication
    tel./fax: +30 210 8203473
    Electronic copies of my papers are available from the SSRN eLibrary at: http://ssrn.com/author=567485



  • 3.  Suggestions for zero spike/ left- skewed independent variable

    Posted 10-29-2015 13:31

    Hi Eleanna,

     

    Since it is the independent variable rather than the dependent variable that is highly skewed, it may not matter as long as the regression residual is normally distributed. But it may suggest a non-linear relationship so you may want to test that.

     

    If you use STATA, there is an on-line thread of discussion about this issue http://www.stata.com/statalist/archive/2010-03/msg01034.html

     

    Hope this helps!

     

    Ping

     

    Pingshu Li
    PhD Candidate | Organizational Behavior & Human Resources

    University of Kansas | School of Business
    pingshu.li@ku.edu |(785) 864-7508

     

    From: Organizational Behavior Division Listserv [mailto:OB@AOMLISTS.PACE.EDU] On Behalf Of Eleanna Galanaki
    Sent: Thursday, October 29, 2015 8:48 AM
    To: OB@AOMLISTS.PACE.EDU
    Subject: [OB-LIST] Suggestions for zero spike/ left- skewed independent variable

     

    Dear colleagues,
    I am facing an issue during some analysis and wonder if some of you have already faced it before and could suggest some references.
    In a model that I want to test, one of the independent variables is a continuous variable, with positive values, but with high left skewness and a zero spike. This variable expresses the extend of offer of some benefits, but many of my respondents have answered that they do not receive them, so 30-70% of the answers are zero values, depending on the benefit studied. I feel that in this case I have, in one variable, two types of information that should not be treated in the same way: one yes-no dichotomous variable and one continuous variable (extent of use). In fact, this case has been treated in other disciplines with the introduction of the continuous variable along with a dummy variable, in order to grasp both sources of variance (quantitative and qualitative). When I attempted that in my data, though, I got issues with multicollinearity. I have also tried standardizing the variable, but am not sure whether this is acceptable.
    Any suggestions and/ or references would be greatly appreciated!
    Eleanna

    P.S. the reference for the simultaneous use of the dummy + continuous is from epidemiology:
    Leffondré, K., Abrahamowicz, M., Siemiatycki, J., & Rachet, B. (2002). Modeling Smoking History: A Comparison of Different Approaches. American Journal of Epidemiology, 156(9), 813-823. doi: 10.1093/aje/kwf122

    --

    Dr. Eleanna Galanaki,

    Assistant Professor in Organizational Behavior,

    Athens University of Economics and Business,

    School of Business, Department of Marketing and Communication

    tel./fax: +30 210 8203473

    Electronic copies of my papers are available from the SSRN eLibrary at: http://ssrn.com/author=567485



  • 4.  Suggestions for zero spike/ left- skewed independent variable

    Posted 10-29-2015 16:39

    Eleanna,

     

    You may want to try running a zero-inflated negative binomial regression or a poisson regression.

     

    Best of luck,

     

    Dr. Angus Duff

    Assistant Professor, Human Resource Management

    School of Business and Economics

    Thompson Rivers University

    International Building, Office 2018

    900 McGill Road, Kamloops, BC  V2C 0C8

    Tel:  250.371.5903

    aduff@tru.ca

    www.tru.ca/business

     

    From: Organizational Behavior Division Listserv [mailto:OB@AOMLISTS.PACE.EDU] On Behalf Of Li, Ping Shu
    Sent: Thursday, October 29, 2015 10:31 AM
    To: OB@AOMLISTS.PACE.EDU
    Subject: Re: [OB-LIST] Suggestions for zero spike/ left- skewed independent variable

     

    Hi Eleanna,

     

    Since it is the independent variable rather than the dependent variable that is highly skewed, it may not matter as long as the regression residual is normally distributed. But it may suggest a non-linear relationship so you may want to test that.

     

    If you use STATA, there is an on-line thread of discussion about this issue http://www.stata.com/statalist/archive/2010-03/msg01034.html

     

    Hope this helps!

     

    Ping

     

    Pingshu Li
    PhD Candidate | Organizational Behavior & Human Resources

    University of Kansas | School of Business
    pingshu.li@ku.edu |(785) 864-7508

     

    From: Organizational Behavior Division Listserv [mailto:OB@AOMLISTS.PACE.EDU] On Behalf Of Eleanna Galanaki
    Sent: Thursday, October 29, 2015 8:48 AM
    To: OB@AOMLISTS.PACE.EDU
    Subject: [OB-LIST] Suggestions for zero spike/ left- skewed independent variable

     

    Dear colleagues,
    I am facing an issue during some analysis and wonder if some of you have already faced it before and could suggest some references.
    In a model that I want to test, one of the independent variables is a continuous variable, with positive values, but with high left skewness and a zero spike. This variable expresses the extend of offer of some benefits, but many of my respondents have answered that they do not receive them, so 30-70% of the answers are zero values, depending on the benefit studied. I feel that in this case I have, in one variable, two types of information that should not be treated in the same way: one yes-no dichotomous variable and one continuous variable (extent of use). In fact, this case has been treated in other disciplines with the introduction of the continuous variable along with a dummy variable, in order to grasp both sources of variance (quantitative and qualitative). When I attempted that in my data, though, I got issues with multicollinearity. I have also tried standardizing the variable, but am not sure whether this is acceptable.
    Any suggestions and/ or references would be greatly appreciated!
    Eleanna

    P.S. the reference for the simultaneous use of the dummy + continuous is from epidemiology:
    Leffondré, K., Abrahamowicz, M., Siemiatycki, J., & Rachet, B. (2002). Modeling Smoking History: A Comparison of Different Approaches. American Journal of Epidemiology, 156(9), 813-823. doi: 10.1093/aje/kwf122

    --

    Dr. Eleanna Galanaki,

    Assistant Professor in Organizational Behavior,

    Athens University of Economics and Business,

    School of Business, Department of Marketing and Communication

    tel./fax: +30 210 8203473

    Electronic copies of my papers are available from the SSRN eLibrary at: http://ssrn.com/author=567485



  • 5.  Suggestions for zero spike/ left- skewed independent variable

    Posted 10-30-2015 04:51
    Hi Eleanna,

    in addition to the excellent previous suggestions, I would recommend the following papers:

    Atkins, D. C., & Gallop, R. J. (2007). Rethinking how family researchers model infrequent outcomes: a tutorial on count regression and zero-inflated models. Journal of Family Psychology, 21(4), 726.
    Blevins, D. P., Tsang, E. W., & Spain, S. M. (2015). Count-Based Research in Management Suggestions for Improvement. Organizational Research Methods, 18(1), 47-69.
    Coxe, S., West, S. G., & Aiken, L. S. (2009). The analysis of count data: A gentle introduction to Poisson regression and its alternatives. Journal of Personality Assessment, 91(2), 121-136.

    The Atkins & Gallop paper gives a good overview using a real example.
    The Blevins et al. paper has a nice decision tree.
    The Coxe et al. paper is more technical, but very thorough, and includes SPSS syntax.

    Best,
    Heiko