problems with correlation & mulitple regression

greenspun.com : LUSENET : History & Theory of Psychology : One Thread

Apologies, this is a very long question..

Research shows that life events and behaviour problems are associated in children but only cross-sectional studies have been done and so no causal inferences have been made ie. We don't know if life events increase the risk of developing behaviour problems or if existing behaviour problems increase the risk of experiencing life events.

For my thesis, I am investigating whether life events and behaviour problems are associated in my sample of children (as suggested by previous research) and if they are associated, whether life events cause behaviour problems or behaviour problems cause life events.

At time one, parents filled in a life events checklist for the previous 12 months - the life events score is the number of life events the child experienced in the last 12 months. They also filled in the Behaviour Problems Inventory (BPI) and there is a time one score for behaviour problems

12 months later, at time 2, they completed the same measures, giving us Time 2 life events score and Time 2 behaviour problems score.

There are two parts of my analysis - cross-sectional (are life events and behaviour problems related) and longitudinal (Do life events predict behaviour problems after previous life events and behaviour problems are controlled for and vice versa)

The first part of the analysis, the cross-sectional part is just a correlation between time 1 life events score and time 1 behaviour problems score to see if the two are related.

The two-tailed test shows that they are not significantly related (r= .141, p= .099) but the one-tailed test shows that they are significantly related (r= .141, p= .05)

If I use the two-tailed test (they are not significantly related) can I still do a multiple regression to work out causality?

If not, can I use the one-tailed test, as long as my hypothesis is directional - I know that previous research has consistently found life events and behaviour problems are significantly positively related.

I'll wait to see if anyone can help me with this before I post about the problems I'm having with my multiple regression!

If you've read this far, thank you!

-- Katy Thornton (pspe1a@bangor.ac.uk), September 17, 2004

Answers

Regarding your use of one-tail vs. two-tailed tests, ideally, you should have decided on this *before* doing your analysis. Trying a significance test on one-tail then two-tail to see which comes out significant isn't the best of methodology. Even if previous research has found a positive correlation, usually two-tailed tests are recommended (a reader of your thesis might ask you to justify the one-tail test). Also, take note of the fact that depending on your sample size, you may be getting significance simply because you have a large sample. You might want to compare your correlation of .141 to previous correlations from previous research in your area of research. Is your correlation comparable? If previous correlations in your area are say, .5 to .6, then .14, whether significant or not, might not be very meaningful, and its statistical significance may simply be a function of a large sample size.

Multiple regression cannot be used, by itself, to work out or support any causal claims. This is true whether you do one-tail or two-tail tests. The most the regression can tell you is that one or more variables (your predictors) "explain" or "account for" the variability in another (your response, or "dependent" variable). To make claims of causation, you should have an extremely well controlled experiment, and even then, you can usually only make tentative claims to a causal process. Multiple regression has nothing to do with causality.

My suggestion would be to avoid claims of causality in your analysis, and at most try to evaluate whether one or more variables predicts the dependent variable better than if we were just guessing at the value of the dependent variable. In other words, do your predictors help you reduce your uncertainty in predicting your response variable? And if so, how much?

You may also want to assure that the instrument you used to measure behavior problems (BPI) is a sound instrument in terms of reliability and validity, and that you are using it on a population for which it was standardized. If this isn't the case, then the foundation on which your stats analysis rests may be weak. Then, the conclusions of your regression (or whatever stats procedure you do) would be open to severe criticism.

Best of luck, Dan.

-- Daniel J. Denis (daniel.denis@umontana.edu), September 17, 2004.


Moderation questions? read the FAQ