Correlation amidst estimable measurement error

Corre­la­tion attempts to mea­sure the asso­cia­tion bet­ween two mea­su­ra­ble pro­per­ties, like height and weight. Howe­ver, all the fancy sta­tis­tics that allow one to draw sta­tis­ti­cal infe­ren­ces regar­ding corre­la­tion assume that you’ve obtai­ned per­fect mea­su­re­ment of the pro­per­ties of inte­rest. The real world is fuzzy, so per­fect mea­su­re­ment rarely hap­pens. When mea­su­ring pro­per­ties of the mind, it is typi­cal to observe a great deal of ran­dom varia­bi­lity from moment to moment. Typi­cally, this varia­bi­lity is averaged-out and sub­se­quently igno­red. While many researchers are beco­ming aware that this varia­bi­lity is impor­tant to unders­tand as a pro­perty of the mind itself, little atten­tion has been paid to the con­se­quen­ces this varia­bi­lity has for sta­tis­ti­cal infe­rence. This is des­pite the fact that it was demons­tra­ted over a hun­dred years ago by Spear­man that corre­la­tion coef­fi­cients obtai­ned from error-prone mea­su­re­ments will sys­te­ma­ti­cally unde­res­ti­mate the true corre­la­tion bet­ween the pro­per­ties being correlated.

This page hosts a draft of a paper I’ve sub­mit­ted to a cou­ple top tier stats jour­nals already (don’t worry, not at the same time!); both liked it but rejec­ted it sug­ges­ting resub­mis­sion after revi­sion. JRSS:B wan­ted analy­tic proof of my approach, CSDA wan­ted robust­ness tests for assump­tion vio­la­tions. I have some further ideas for impro­ving the work below (ex. I mis­ta­kenly appro­xi­mate the true between-Ss variance from the obser­ved between-Ss variance, when it could be more accu­ra­tely appro­xi­ma­ted by other methods; I also sus­pect resam­pling might obviate para­me­tric assump­tions), but where this began as a side pro­ject and where I have rela­ti­vely little for­mal sta­tis­tics trai­ning, I’m on the look-out for a co-author able to bring the manusc­ript to peer-reviewed publi­ca­tion. In the meantime…

Here’s the PDF.

To anyone that actually reads that, I apo­lo­gize for the tabu­lar results; I gene­rally pre­fer visual pre­sen­ta­tion of data, but found this dif­fi­cult with such a large para­me­ter space to desc­ribe (a 5×3×3×3 space explo­red by 4 methods with 3 per­for­mance mea­su­res). I might try visua­li­za­tion again before the next revision.

The take-home mes­sage from this work is that tra­di­tio­nal tests against a null-hypothesis of zero corre­la­tion are unaf­fec­ted by mea­su­re­ment error. Howe­ver, tra­di­tio­nal tests com­pa­ring two non-zero corre­la­tions (as I unders­tand are com­mon in fields like prin­ci­ple com­po­nent analy­sis, etc) will be affec­ted such that tra­di­tio­nal sta­tis­tics will be too libe­ral. An unex­pec­ted but neat fin­ding is that inc­rea­sing the num­ber of par­ti­ci­pants actually exa­cer­ba­tes the pro­blem, as if the sta­tis­tics are beco­ming over­con­fi­dent. My solu­tion sol­ves these pro­blems… mostly. I think that I can achieve a com­plete solu­tion if I use a bet­ter esti­mate of the true between-Ss variance, but I’ll have to re-run the simu­la­tions to test this theory.

I’ll hope­fully post the code for the simu­la­tions here, as soon as I make sure it’s presentable.

Adden­dum: I just read a paper (PDF) that desc­ri­bes appli­ca­tion of mixed effects mode­ling to CAEME. It would be great if MEM pro­vi­des a quick analy­tic solu­tion, but I’m sur­pri­sed that the MEM esti­mate of corre­la­tion can be a dif­fe­rent sign than the raw corre­la­tion! Hope­fully I’ll unders­tand the mecha­nics of this trans­for­ma­tion bet­ter after taking a course on MEM.

Leave a Reply