Data "Snooping"

Add this phrase from economics to your vocabularies to describe the "other" studies where proxies with known HS shapes like bristlecones and Yamal are used time after time. Here’s a website with some links. They cite Sullivan, Timmermann and White (1999) and White (2000) for the following definition:

"Data-snooping occurs when a given set of data is used more than once for purposes of inference or model selection."

Hello??

The topic is actively being researched in econometrics and the methods need to be applied to multiproxy studies.

This entry was written by Stephen McIntyre, posted on Aug 30, 2006 at 5:28 PM, filed under General and tagged snooping. Bookmark the permalink. Follow any comments here with the RSS feed for this post. Both comments and trackbacks are currently closed.

15 Comments

George

Posted Aug 31, 2006 at 7:02 AM | Permalink

Hmmm. “data snooping is a dangerous practice to be avoided, but in fact it is endemic.” Not just in econometrics!

“The main problem has been a lack of sufficiently simple practical methods capable of assessing the potential dangers of data snooping in a given situation.” Multiple choice question – What’s the ‘main problem’ in climate science? (1) the data sets give the desired results, so why switch data?; (2) researchers A, B, and W used those data sets, and published their results in refereed journals, so they must be correct; (3) new data which would provide alternate explanations are ignored; (4) none of the above; (5) all of the above.
Steve McIntyre

Posted Aug 31, 2006 at 7:31 AM | Permalink

There are several articles in the Journal of Economic Methodology, 2000 on data snooping which have lots to say about multiproxy studies. I’m going to post up some excerpts when I have time.
bender

Posted Aug 31, 2006 at 8:08 AM | Permalink

Data snooping is how you build hypotheses. It’s a valid practice at the start of the iterative circle that is the scientific method. The problem is that at some point these fanciful conjectures have to be confronted by truth-telling experiments … and climatology (probably because it’s so challenging) has failed to grow beyond the initial exploratory mode of model-building (and code testing, which is not the same thing as model testing). Along comes a desperate policy need, and like unripened fruit on a tree, whatever science that exists is snapped up and used, making for a very bitter cherry pie. In summary, the science is currently more limited than we wish it to be.
kim

Posted Aug 31, 2006 at 8:25 AM | Permalink

Can you bake a cherry pie, Charming Mike, Charming Mike,
Can you bake a cherry pie, Charming Mikey?
I can bake a cherry pie,
All my peers think it’s to die,
But the stats leave a lot of people hungry.
================================
KevinUK

Posted Aug 31, 2006 at 8:57 AM | Permalink

#1, George

Clearly the answer is 5) all of the above.

I’ll leave others to fill in appropriate HT members names against 1), 2), 3) and 4).

Kim, nice poem. Did you write it or Charming Mike (aka Mann)?

KevinUK
Peter Hearnden

Posted Aug 31, 2006 at 9:25 AM | Permalink

Re #5, Ok if ‘5’: ‘new data which would provide alternate explanations are ignored’ then: why aren’t there several alternate ‘sceptic’ paleoclimate NH/SH/globe temperature reconstructions??? I’ve said, several times, I’d love to see it/them. I really would!
bender

Posted Aug 31, 2006 at 9:45 AM | Permalink

Re #6: PH, your answer is in post #233 here
Mark T.

Posted Aug 31, 2006 at 10:51 AM | Permalink

I’ve said, several times, I’d love to see it/them. I really would!

The onus is on those that produced the original, flawed reconstructions to revise their data, and methods, and prove it can be done.

Quite frankly, I do not think we can know with _any reasonable_ certainty what past temperatures were. There needs to be a magic bullet proxy that records temperature throughout the year AND, somewhere along the line the notion of “global mean temperature” needs to really be defined in a way that makes some sense.

Mark
UC

Posted Aug 31, 2006 at 10:57 AM | Permalink

#6 Here is one. I bet it is ‘sceptic’ (consistent).
KevinUK

Posted Aug 31, 2006 at 1:22 PM | Permalink

#9 UC,

What on earth do condoms have to do with global warming? Looks like another one to be added to the “Complete list of things caused by Global Warming” here.

KevinUK
Steve Sadlov

Posted Aug 31, 2006 at 1:39 PM | Permalink

RE: #8 – Temperature is probably a fairly sucky characteristic to be attempting to measure / guess. My own personal favorite is power. I’d love to see global P(rms).
Hans Erren

Posted Aug 31, 2006 at 1:49 PM | Permalink

my attention on http://data-snooping.martinsewell.com/ was directly focused on this weird reference:

AIR, R. and S.E.A. GRAVIMETER, DATA SNOOPING, CORRECTION AND REDUCTION OF THE AIRBORNE GRAVIMETRY DATA ACQUIRED BY A LACOSTE-. cnrm.meteo.fr. [not cited] (?/year)

messrs R. Air and S.E.A. Gravimeter didn’t sound real, and ideed the full citation should read:

DATA SNOOPING, CORRECTION AND REDUCTION OF THE
AIRBORNE GRAVIMETRY DATA ACQUIRED BY A LACOSTE-
ROMBERG AIR/SEA GRAVIMETER

M. Abbasi, J.P. Barriot, J. Verdun et H. Duquenne
Bureau Gravimétrique International (BGI), UMR5562, Observatoire Midi-Pyrenées, 31400, Toulouse, France

And on reading the paper, it is not about data snooping, these french guys should have used the word data-acquisition in their title.
UC

Posted Aug 31, 2006 at 10:19 PM | Permalink

#10

Hey, that’s what happens to the error bars if you try to track AR1 and loose the measurements! U can blame math. I’m not saying that it is the best reconstruction ever, but it is alternate and sceptic, as per requested.
2dogs

Posted Sep 2, 2006 at 12:10 AM | Permalink

As I understand it, “data snooping” occurs where the same observations are used for two or more of the following:

a. the initial observations that prompted the investigation;

b. observations used to calibrate models; and

c. observations used to test and validate models.

However, again as I understand it, it is relatively okay to re-use observations to test as many alternative hypotheses as you like – so you can have a null hypothesis, and alternative hypothesis 1- 10, all tested on the same data, with the best overall hypothesis selected. Is this correct?
Steve Sadlov

Posted Sep 5, 2006 at 2:12 PM | Permalink

RE: #9 – That is proper thinking. A very realistic view.