The Texas Sharpshooter fallacy is a logical fallacy where a man shoots a barn thirty times then circles the bullet holes nearest each other after the fact calling that his target. It’s of particular concern in epidemiology.
Folks, you are never going to see a better example of the Texas Sharpshooter work itself out in real life than Caspar Ammann’s handling of Mann’s RE benchmark.
I introduce you to Caspar Ammann, the Texas Sharpshooter. Go get ’em, cowboy.
In Ammann’s replication of MBH, he reports a calibration RE (the Team re-brand of the despised calibration r2 statistic) of 0.39 and a verification RE of 0.48. So that’s his bulls’ eye.
In our original papers, we observed that combinations of high calibration RE and high verification RE statistics were not necessarily “99.99% significant” (whatever that means), but were thrown up quite frequently even by red noise handled in Mannian ways. So something that might look at first blush like sharpshooting, could happen by chance.
In my first post, the other day on this, I observed that Ammann’s simulations, like ours, threw up a LOT of high RE values – EXACTLY as we had found. There are nuances of differences in our simulations, but he got a 99th RE percentile of 0.52, while we got 0.54 in MM2005c. Rather than disproving our results, at first blush, Ammann’s results confirmed them. Mann didn’t appear to be quite the sharpshooter that he proclaimed himself to be or that everyone thought. (This is something that should have been reported in their article, but, needless to say, they aren’t going to admit that we know the street that we live on.)
It’s not that the MBH RE value for this step isn’t in a high percentile – it is, something that we reported in our articles, though in a slightly lower percentile according to our calculations. For us, the problem was the failure of other statistics, which suggested to us that the seemingly high RE statistic (99.999% significant) was an illusion from inappropriate benchmarking – a form of analysis familiar in econometrics (especially the seminal Phillips 1986). The pattern of MBH statistics (high RE, negligible verification r2) was a characteristic pattern of our red noise simulations – something we reported and observed in our 2005 articles.
Obviously, it wasn’t enough for Ammann to show that the MBH RE value was in a high percentile – he wanted to show that it was “99% significant” as the maestro had claimed.
So he re-drew the bulls’ eye. A couple of days ago, I described the two steps whereby Ammann gets the MBH RE score (0.4817) into the 99% “bullseye” but this was my first cut analysis and did not tie it directly to the re-drawing of the bullls’ eye.
Ammann’s first step was to assigned an RE value of -9999 to any result with a calibration RE under 0. That only affected 7 out of 1000 and didn’t change the 99% percentile anyway. So this seemingly plausible argument had nothing to do with re-drawing the bulls’ eye, as noted previously.
The bulls’ eye was re-drawn in the next step – where Ammann proposed a “conservative” ratio of 0.75 between the calibration RE and verification RE statistics. Using this “conservative” ratio, he threw out 419 out of 1000 votes. The salient question is whether this “conservative” procedure has any validity or whether it’s more like throwing out black votes because they couldn’t answer a skill-testing question like naming the capital of a rural county in Tibet or identify the 11th son of Ramesses II. I’ll provide some details below and you decide.
First no one has ever heard of this “conservative” benchmark – and I mean, no one. You can’t look up this “conservative” ratio in Draper and Smith or other statistical text. The “conservative” benchmark is completely fabricated. So everyone’s statistical instincts should be on red alert (as Spence_UK and Ross’ have been and as mine were.)
So I thought – let’s look at the votes that didn’t count. What did the rejected votes actually look like? First of all, if a simulation had a negative RE score, those would fail the test and be re-assigned to -9999. OK, but those didn’t matter, because they were already to the left of the target; the 99% bulls’ eye wasn’t affected by this.
The only ones that mattered were the votes with RE scores higher than MBH which were thrown out on this new technicality. There were 13 votes thrown out on this pretext which I list below in order of decreasing RE score (note once again how high both the calibration and RE scores are in these rejected votes.) Most of the rejected votes had calibration RE values above 0.3, but slightly lower the value of the calibration RE in the WA emulation of MBH (0.39), but the third one in the list had both a calibration RE and a verification RE that were higher than MBH. Nonetheless, the vote still got thrown out. The RE score was too “good”.
For a calibration RE of 0.3957, the maximum allowable verification RE to be eligible would be 0.528! (0.3957/.75). Turn that over in your minds, folks. If the calibration RE was 0.3957, unless the verification RE was exactly between 0.4817 (MBH) and 0.528, the score would be be placed to the left of MBH and the bulls’ eye re-drawn. Redneck scrutineers would be proud.
# Cal_RE Ver_RE
647 0.3390 0.644
944 0.3485 0.620
113 0.3957 0.609
548 0.3016 0.599
374 0.3542 0.550
683 0.2479 0.542
153 0.3826 0.519
146 0.3112 0.514
299 0.3383 0.508
40 0.3176 0.508
194 0.1840 0.502
492 0.3552 0.491
656 0.3284 0.483
Once the above 13 votes were thrown, MBH was declared the winner of the election by 99% of the votes – sort of like a paleoclimate Kim Il Jong.
Let’s think a little further about the “conservative” ratio of 0.75 between calibration RE and verification RE. The one that no one’s ever heard of. Where did it come from? As soon as he saw it, Spence_UK thought that it probably stunk and, needless to say, it does. Here’s how it works.
The MBH ratio in the AD1400 step is 0.813. So any ratio that is higher than 0.813 would cause the MBH result to be thrown out. 0.75 is tucked in just under the value that would cause the MBH result to be thrown out. That’s the first part. (Ammann’s code shows that he tested a variety of cases with values higher than 0.813, but that these ratios would cause MBH rejection is never mentioned.)
On the other hand, if you go to a ratio of 0.5 (also a case shown in the code but not discussed), you don’t throw out enough votes. Only 2 votes would get thrown out with such a criterion and MBH would not win the election.
So 0.75 is pretty much the optimum value for throwing the maximum number of votes out without throwing MBH out. Perhaps this is what Ammann meant by a “conservative” ratio – he’s allying himself with redneck vote manipulation. Hardly what one expects in Boulder, Colorado, but life is strange.
Now there are other issues involved in all of this, such as whether bristlecones operate as a type of radar meterological antennae measuring temperatures in Asia, Africa and Australia. And nothing in this particular dispute affects the “big picture”.
In the past, I’ve sometimes sarcastically referred to the Team as the gang who couldn’t shoot straight. You’d think that if they sent out a Texas sharpshooter gunning for Ross and me, that they’d send out a guy that wouldn’t shoot himself in his own foot. Or draw the bulls’ eye with himself in the middle? But that’s the Team.
Who else could lose a Texas sharpshooting contest?
159 Comments
If the code shows that he tested a variety of cases, many of which didn’t meet the expected criterion, doesn’t that demonstrate intent?
Bruce
Steve: Let’s give the intent a rest for a while. We can all speculate on why. But without access to records, you can never really know for sure. I realize that I was pretty mad about the contents of this SI, but, in practical terms, we’ll never know Ammann’s intent without access to his records, which we’re not going to get. So let’s discuss what’s on the record.
At a certain point, it doesn’t matter anyway – if the analysis is comical, who cares what his intent was? Did he intend to appear in public with a bulbous red nose and funny shoes or is that what he wears around every day? I don’t care any more. Let’s just discuss the bulbous red nose and funny shoes, which are perhaps inappropriate costumes to wear to a PR Challenge.
Please try to show more tolerance for the mathematically challenged and those who have to exaggerate their sharpshooting skills. In my rural Texas high school learning to dehorn and castrate was a major part of the curriculum as compared to less important subjects such as math. Another deficiency was that I had to wait until college to have formal training in sharpshooting. Perhaps Ammann was similarly deprived in his childhood?
Steve,
I can follow the basics but the advanced math is, and forever will be, over my head.
What is not over my head however is something seldom commented on; you are a wonderful linguistic stylist. There was a great career waiting for you in some area of writing, had you not gone the route you chose.
During my Grad school adventure, we didn’t refer to Statistics as “Sadistics” for nothing.
Lies, damn lies and Stats.
Go Team go.
How to repair the damage that the funny nose has done with no pea under any shell with 99.99 certainty . snip – policy Thank you for your work.
Steve
Thank you for edit, sometimes the bigger picture invades my smaller brain. Thank you again for the fundamental work. It is shocking to see the level of selective detail that Ammann coded to get his results. With code that I am responsible for I require commentary that highlights the intention of a section. If there is no commentary much more review is required. Sometimes I have spent more than a year digging at a hex dump to reveal the problem. Typically money is involved, what makes your work so important is that it is possibly future money.
Somebody told me recently that statistics is alot like a string bikini on a beautiful woman. What is revealed can be very interesting to look at…but what is concealed is often far more fascinating. I think Steve just gave Caspar a wardrobe malfunction…nice work Steve.
Yet another “novel” statistical method that isn’t tested before applied.
Can any stats whiz pull-up some cal RE/ver RE ratios in other publications so we can see how bunk it is?
Steve: Nope. It’s never been used anywhere. It was specially concocted for this particular Texas sharpshooting contest. But what a pathetic performance by Climatic Change. They knew that this whole thing of RE significance was a battleground issue. Wahl and Ammann had been held up for years because the rejection of previous efforts to circumvent this problem. You’d think that someone would have asked him – where the hell did this criterion come from? Show me a reference. What does it do? But these folks are so consumed by the desire to vindicate themselves that they don’t notice that they’re wearing bulbous red noses.
There are a few lessons here. With the data and code in hand, what did it take to figure this out? A couple of days?
I put my code out there so that people could refute the results if they deserved refuting. And in fairness to Ammann, while it took 3 years for him to put this data and code online, and while he prudently waited until AR34 was safely out of the way, he did put the data and code online, so that I’m in a position to make definitive statements about this without getting into Mannian arguments about whether we made a wrong turn on the road to Podunk, when the map was wrong in the first place.
If people want to improve actual knowledge and actually resolve things in a definitive way, this is the way to go. So good for Ammann in finally complying. Although, since he raised money from NOAA for the PR Challenge promising open source, he was in a bit of an awkward spot in continuing to withhold his SI.
The problem with what appears to be a total evisceration of poor Caspar is that this is going to provide very little encouragement to Briffa or Esper or someone like that to show what they did. Their conclusion is not that this exercise is an excellent example of open source at work, but that Ammann was a damn fool for ever showing his data and methods.
Am I the only one who’s starting to feel like this is too good to be true? Why would this data be released at all if it shows such incompetance/snip ? Is there any chance that the data newly released is not for real? Were they forced to release this data? If not, I don’t get it.
Is it possible that the Team planted this data?
It just boggles my mind that they would release such incriminating data if they didn’t have to.
I have to agree with #3. I laughed to the floor with the satire. Great read, Mr. Steve. And I’m sorry for having to deal with such idiotic papers, instead of auditing stuff more interesting from the technical point of view.
Steve,
As a non-scientist, I thoroughly enjoy you’re site. I do have one reservation…
I understand yhat your primary objective is to get at the numbers to either validate or falsify, and that you do not wish to [snip] However, if you view this in the context of other occupations; say legal or medical, the actions taken by Team members (regardless of intent) would undoubtedly be met by censure, … [snip]
I say this with a sad heart, but it is truly ashame when the legal community does a better job of self-enforcement than portions of the scientific community. What exactly does this say about the state of science??
Keep up the good fight!
Yeah Patrick, what is going on? Is there someone behind the scenes here having a good long laugh?
Hell, here I thunk my sharpshooting prowess was second to none until Steve found me out… 😉
Please – no more angry posts. I ask people over and over not to be angry. I realize that I was angry with Ammann for a couple of days, but I’m back to seeing the humor in all of this. If you post something angry, be prepared for it to be removed.
Trying to understand this in simple terms that make sense to me. If I’m reading this correctly:
1) Using all the data does not validate MBH:
1a) Bullseye (MBH data analysis) is at (REcal,REver) = (.39, .48), but
1b) The needed 99th percentile value is REver = .52 (or .54 for mm)
2) Tossing outliers in a typical fashion doesn’t help (because tossing on both ends does not move the bullseye?)
3) This “unprecedented” ratio method selects samples to be removed in a such a way that “shots” on only one side of the bullseye are removed, thus in effect moving the bullseye.
3a) To get the 99th percentile value down to MBH range, we need to remove “shots” with higher REver than MBH.
3b) But we don’t want to SAY that
3c) So we use this nice table of ratios and see what value we can use that will exclude high-REver “shots” without excluding MBH itself:
Item REcal REver ratio
113 0.3957 0.609 0.650
647 0.3390 0.644 0.526
944 0.3485 0.620 0.562
548 0.3016 0.599 0.504
374 0.3542 0.550 0.644
683 0.2479 0.542 0.457
153 0.3826 0.519 0.737
146 0.3112 0.514 0.605
299 0.3383 0.508 0.666
40 0.3176 0.508 0.625
194 0.1840 0.502 0.367
492 0.3552 0.491 0.723
656 0.3284 0.483 0.680
MBH 0.3916 0.482 0.813
Examining the table shows that any ratio value between 0.74 and 0.812 will “work”… and 0.75 is a nice “round” number.
Do I have it right so far?
Here are my questions:
* If a table like this were created with ALL the votes, are there other high-RE samples that also have high ratios and thus were not excluded? If so, why is that not important? (My guess: all you need is enough rejections to move the bullseye.)
* Why doesn’t it matter that low-RE values would also be excluded? If an equal number were excluded on both sides, wouldn’t that leave the bullseye in the same place?
Probably just dense here 🙂
Re#8 (Steve), I apologize that my post wasn’t clear enough. I know the use of these ratios is new territory (hence my jab at “novel”).
I was asking if there were published studies (in any field) in which there was both a calibration RE and a verification RE calculated, so that posters could take a look at their resulting ratios and see where this magic 0.75 cutoff fits-in with those.
Just so everyone’s on the same track, there is no such thing in statistics as “99% significance.” So Ammann is searching for a fake talisman in the first place. But having committed himself to the fake talisman, he is going to great lengths to protect it/
Yes, there are results that pass that Ammann Texassharpshooting benchmark with REs higher than MBH (and BTW this number concedes Mannian principal components which isn’t really valid either) but not a lot. However, you couldn’t honestly say that a recon with REcal of 0.39 and REver of 0.48 was “99% proven” relative to say one with REcal of 0.536 and REver of 0.441. The important point in all of this is that the methodology throws up a LOT of values with both in high ranges, simply from red noise processes.
It’s pretty cheeky to have this data in your SI and then say in your IPCC-approved article that an REver of 0.0 is “99% significant”.
REcal REver Ratio
657 0.459 0.577 0.795
505 0.417 0.542 0.769
221 0.533 0.539 0.987
440 0.452 0.521 0.868
60 0.492 0.520 0.945
854 0.484 0.504 0.960
752 0.440 0.499 0.881
1 0.452 0.499 0.908
549 0.629 0.495 1.271
795 0.583 0.495 1.179
357 0.472 0.484 0.977
815 0.391 0.475 0.825
486 0.421 0.473 0.889
862 0.417 0.470 0.887
527 0.414 0.464 0.892
159 0.426 0.462 0.923
58 0.434 0.458 0.948
318 0.370 0.457 0.810
878 0.472 0.452 1.044
544 0.494 0.452 1.093
649 0.465 0.443 1.051
971 0.536 0.441 1.213
148 0.413 0.440 0.938
50 0.464 0.439 1.056
232 0.444 0.436 1.017
966 0.332 0.433 0.765
733 0.510 0.431 1.182
764 0.454 0.431 1.054
763 0.329 0.428 0.770
613 0.414 0.427 0.970
403 0.357 0.427 0.837
561 0.404 0.427 0.946
4 0.348 0.426 0.816
696 0.423 0.423 1.000
686 0.502 0.422 1.191
278 0.348 0.418 0.834
699 0.490 0.414 1.183
138 0.436 0.413 1.058
701 0.504 0.412 1.224
271 0.386 0.411 0.938
233 0.401 0.408 0.982
599 0.504 0.406 1.241
127 0.427 0.405 1.054
7 0.345 0.404 0.853
393 0.458 0.403 1.137
350 0.409 0.401 1.019
460 0.375 0.400 0.937
340 0.338 0.395 0.857
398 0.538 0.394 1.365
251 0.382 0.393 0.970
792 0.445 0.393 1.134
59 0.378 0.391 0.966
11 0.453 0.388 1.168
260 0.490 0.384 1.276
75 0.391 0.383 1.021
179 0.387 0.381 1.017
707 0.363 0.381 0.952
322 0.455 0.378 1.204
425 0.456 0.374 1.220
69 0.434 0.373 1.162
20 0.508 0.370 1.371
351 0.436 0.370 1.179
685 0.407 0.367 1.110
478 0.288 0.367 0.786
210 0.469 0.366 1.283
64 0.466 0.366 1.274
916 0.381 0.364 1.045
80 0.436 0.363 1.200
936 0.414 0.363 1.140
383 0.474 0.363 1.305
261 0.506 0.362 1.398
270 0.273 0.362 0.755
617 0.357 0.361 0.988
577 0.485 0.361 1.344
450 0.311 0.361 0.861
438 0.336 0.359 0.937
510 0.423 0.356 1.189
227 0.403 0.355 1.133
408 0.289 0.355 0.814
130 0.267 0.355 0.753
188 0.410 0.354 1.158
354 0.291 0.353 0.822
39 0.318 0.346 0.918
228 0.505 0.345 1.464
391 0.338 0.345 0.981
215 0.396 0.345 1.150
857 0.274 0.344 0.796
458 0.322 0.343 0.940
198 0.361 0.343 1.052
982 0.349 0.340 1.027
319 0.432 0.338 1.277
709 0.279 0.338 0.824
895 0.392 0.337 1.162
443 0.325 0.337 0.964
951 0.509 0.336 1.517
576 0.523 0.336 1.558
861 0.367 0.334 1.099
504 0.520 0.333 1.561
572 0.310 0.333 0.933
29 0.283 0.332 0.852
129 0.251 0.330 0.759
173 0.338 0.330 1.023
91 0.286 0.330 0.866
807 0.543 0.330 1.646
94 0.305 0.330 0.926
448 0.461 0.330 1.399
506 0.329 0.329 1.001
281 0.512 0.328 1.559
293 0.317 0.327 0.972
364 0.410 0.324 1.264
483 0.289 0.323 0.895
336 0.277 0.322 0.859
829 0.375 0.322 1.165
780 0.339 0.322 1.053
239 0.286 0.320 0.895
636 0.428 0.319 1.341
217 0.364 0.318 1.143
349 0.474 0.317 1.494
727 0.327 0.315 1.038
212 0.325 0.314 1.034
520 0.343 0.313 1.095
625 0.355 0.311 1.142
143 0.266 0.311 0.858
107 0.457 0.310 1.475
841 0.563 0.310 1.817
677 0.273 0.308 0.886
133 0.420 0.305 1.374
102 0.318 0.304 1.048
131 0.374 0.303 1.236
446 0.443 0.303 1.464
53 0.405 0.303 1.339
571 0.272 0.303 0.900
642 0.317 0.302 1.048
688 0.421 0.302 1.393
974 0.544 0.302 1.801
213 0.351 0.301 1.163
74 0.375 0.301 1.244
830 0.419 0.301 1.391
431 0.352 0.301 1.169
What is the justification for having a verification period RE be greater than the calibration period RE when you know the verification period temperature data is of poorer quality?? Should have been a red flag, at least with the RE stat. Obviously, Mann or Ammann didn’t consider this fact.
Steve: In a standard text, one ironically cited by Mann in his Nature reply, (Wilks), they show that in a stationary process their Skill Score (RE) is necessarily less than the verification r2 (a result previously noted by Murphy 1988, cited in our GRL article). There are all sorts of warning flags all over the place, this is just one of many.
Steve, I read your Site every day. I love you being tedious in Your Work. I too am tedious being a HVAC Service Tech for many years. People do not appreciate this at times because I’m too slow and the Bill is getting bigger. But My “Repeats” are less than 2%. That is the way I look at it. The bottom Line. The Challenge is not letting People affect your Attude and Decision Making. For if they do, it ALWAYS comes back to bite you where you don’t want it to.
We need more People like you. Truth is all WE are asking for, Right? TRUTH………
20 DJ
I’m slightly confused here, and I suspect I’m not alone. Could someone provide a layman’s definition of RE that says what it is trying to measure, and what its range of scores is, is it unit-less, etc.?
I get that it is a measure of skill, and that calculating it for calibration cases vs. verification cases should give distinct values. It also is clear from context that larger values are “better” and that it can be negative.
But since my experience (always risky when thinking about statistics which so often has provable results that seem counter to intuition) tells me that one should be really excited to have a model that shows more skill in verification than over the data against which it was calibrated, I have to wonder why discard those cases at all?
Maybe I don’t understand “skill”, “calibration” or “verification” either… 😉
#22. There are a lot of posts here which start in the middle of the conversation. I’m afraid it’s the nature of this particular blog. I try to write clearly but it’s hard to recite the history of each of this issues in a self-contained manner in every post. Look at the Wahl and Ammann category and work through it if you want a history.
As to being confused about the properties of the RE statistics, that would qualify you to be a climate scientist. There is no theoretical distribution.
Interesting. Sort by ratio and you find:
Out of all of those “high ratio” values, only two remain that have higher RE than MBH but lower ratio (and thus their elimination pushes the bullseye in the “right” direction:
Id REcal REvar Ratio
657 0.459 0.577 0.795
505 0.417 0.542 0.769
But you can’t use them, because to trash them you also trash eight with lower RE… thus pushing the bullseye in the “wrong” direction. (Update: or, maybe they aren’t needed because the bullseye is already moved enough?)
Not only that, but the next element with higher ratio than 0.75 is this one:
130 0.267 0.355 0.753
…which may also push the bullseye in the wrong direction. (Update: Whether or not that’s true, it seems clear those who set this up didn’t understand what they were doing any more than I understand this… as a non-stats guy, I would pick a value that accomplishes what I’m looking for and avoids eliminating more “votes” than needed.)
So the “best” ratio is in a very narrow range. Larger than 0.737 (see #16 above) and smaller than 0.753 — that’s how to maximize the bullseye “push”
So ’twas a very convenient selection of the “conservative” ratio as 0.75
PLEASE correct me if I’m confused. I may not be understanding this at all correctly!
[For example: are there more “wrong direction” values between 0.737 and 0.75? If so, then they gave up a little “significance” to get that nice round 0.75 number.]
As Esper et al 2003 said, without any referee or reader batting an eye:
#10 “Is there any chance that the data newly released is not for real?”
If you go back to the Dragging Cat thread and go down to this statement,
“Just to prove that this matches their results, here is code that reads their output summary, showing that they got exactly the above results. You can inspect these results at the link above.”…
and continue to follow the argument on down the thread, you may be able to satisfactorily answer your excellent question.
#25
So the cherry-picking advantage is NOT unique to dendroclimatology; Texas sharpshooters enjoy the same advantage. (Is Texas sharpshooting an Olympic event? Maybe the Team submitted a team?)
Wow… Just. Wow.
Whether you find merit in the extreme AGW hypothesis or not, statements like Esper’s, as well as the mess discussed in this thread, should always result in statements like Darren’s.
Mark
It’s an amazing quote, I admit! One should make a list of “Famous HS Team Punchlines” and make a post with them. It would be a memorable post.
It’s an old, old quote. Read the blog. It’s not the only one of this type.
They called it “conservative” because it restrained the significance to below 100%.
This use of 99.99% significant is like the practice of in a regression problem finding a R2 of .1 but it passes an F test at .01 so you say it is 99% significant but it explains hardly anything and in fact you can easily get R2 of .33 from random data (depending on sample size), as has been known for 30 years+.
One of the many weird things about Ammann’s novel tests is the use of a ratio of statistics (RE / RE), both of which include zero in their support. Try to picture the space of this ratio. RE is not well-defined to begin with, but put it into this ratio and it becomes a nonstationary monster.
It’s a bit like taking two gaussian normals variables, x~N(0,1) and y~N(0,1). Each one is a nice, well-behaved stationary variable. Now try plotting z=x/y. It’s not normal, it’s Cauchy, and it’s not stationary. It doesn’t even have a finite mean. When dealing with statistics that have zero in their domain, you just can’t form ratios without introducing significant new complications. This is the basis for Gleser&Hwang’s theorem on the non-existence of finite confidence intervals in errors-in-variables models.
Which reminds me — way off topic I realize, but errors-in-variables is the proper term for the technique called “total least squares” used in signal detection regressions by the IPCC team.
bender,
Whether or not it is an old quote, Luis is right. A “best of” list of team quotes would be a howler worth reading.
#33 Exactly. A high significance level (low p value) is important for retaining a hypothesis; that doesn’t mean the hypothesis is all that powerful. For example, in a properly controlled experiment an R2 of 0.05 with a significance level of 0.01 means the effect is very weak, but defintiely not negligible. In an uncontrolled natural “experiment”, the effect may not even exist, may be attributable to other factors not studied.
#31
Yeah, good advice Bender. I guess I’m just going to my bedroom read all CA’s posts. See ya next spring then. Will someone bring me food and water? Thanks in advance!
Jacoby and D’Arrigo are the most quotable IMHO:
http://www.climateaudit.org/?p=570
http://www.climateaudit.org/?p=29
The method in the madness is as follows.
It makes sense to hypothesize that there are certain sites that are just right, climatically, to put them in the “sweet spot” of the tree’s response to growing season extremes. The problem is demarcating, in advance, which sites give you sweet vs. sour responses. There is no a priori theory for drawing that line, so they do it a posteriori, after the fact. Which is the exact definition of “texas sharpshooting”.
And they will continue to do this until there is a fully quantitative physiological theory of tree response to weather variations.
They are not the only sharpshooters, however. Demarcating atmos-ocean circulatory modes (based on a posteriori EOF analysis) and presuming them to be persistent is another kind of texas sharpshooting.
#23, Steve, I’ve been following along enough to have a sense of the history. I’m just trying to get a clear mental model for what this thing called RE is measuring, and what its range of values is. As Ross points out in #34, it seems disquieting that they are taking a ratio of values that in principal can be zero, and treating that as meaningful.
I’m sitting here imagining an analogous (I think, at least) process with a class-room full of student’s grades. Would anyone seriously claim that the ratio of homework score to test score (or vice-versa) is meaningful even if it is well-defined?
I think the Texas Sharpshooter claim is right on.
Y’know, this is really a perfect image ofr so much of this dreck – I should have been using it a long time ago. At the NAS panel presentations, Mann was asked about the divergence problem. His answer – draw a bulls’ eye around the Yamal series and say: look, ma, no divergence problem.
#41
A search of CA will reveal several instances where I’ve previously drawn attention to the TSF in climate science. It is a powerful metaphor. And it is an accurate metaphor.
#34
Ross and others, there is a SIAM book out by Van Huffel and Vandewalle titled “The Total Least Squares Problem”, which addresses the relationship of TLS and EIV starting on page 228 plus a lot of other info on TLS. IMO, a useful book. Not sure if you were familiar with this book.
#34, 43. Ross, I’m with Phil B on this. The same method crops up in different contexts under different names. I don’t think that the econometric label “errors-in-variables” necessarily trumps other usages. However, as you’ve observed, the fact that Hegerl et al cited an 1886 publication as authority was not particularly reassuring on their familiarity with the relevant literature. The method of solving the problem, interestingly, turns out to be a principal component analysis (svd) in which the solution is the least eigenvalue.
#44, I’ve become a big fan of svd, it is an incredible linear algebra tool for both practical and analytical results.
Uh Oh, RC site is down. I wonder if they are going to respond to your latest finding.
wonder how long it will take before the Team’s Statistics efforts become teaching points in University courses . . . . lesson’s in how not to do Stats will be their legacy.
I’ve found this interesting debate with skeptics, among who Roy Spencer, and IPCC people about the Swindle program, in ABC (Australian Broadcast Corporation). Apart from the blatant bias against any questioning and skepticism against the IPCC position (that moderator was clearly on “mission mode”), it’s interesting on how in one instance, the Hockey Stick is discussed, and further on, Roy Spencer blatantly accuses it to be a fraud. The opponent then refers to a study made “two years ago” demonstrating HS to be not falsified at all, and the exchange ends there.
Quite interesting. One wonders if this debate was tomorrow what would that exchange be like again!
It’s on youtube, easy to find. Go see it.
I’m sorry, it wasn’t Roy Spencer (I heard bad), it was Ray Evans.
This “conservative benchmark” seems to be very similar to what we called Cook’s Variable Constant when I studied botany all those years ago.
44: Steve, in an econometrics text (not that I’ve read up much on this) the EIV solution is not simply to pick a rotation direction. You have to deal with the endogeneity problem or your coefficients will be inconsistent. The solution requires using instrumental variables to form a strictly exogenous estimator for the rhs variables. That’s as far as I got in my reading, which takes it up to the 1980s or so, which isn’t very recent, but at least is post-1886.
#49 (Luis)
Couldn’t find the YouTube video you were talking about. Any help?
“The ability to pick and choose which samples to use is an advantage unique to dendroclimatology.”
Yeah, ’cause cherries grow on trees doncha know. That’s why it’s not called tomato picking!
#53
Go Here.
I found the quotation I was looking for. In the Wegman Report, page 14, Dr Wegman says (my emphasis):
Now what Ammann did completely violated this rule. He calculated what the result should be in order to let Mann off the hook, and called it a “conservative ratio”. That is like peeking at the answers and calling it a fair examination.
“No statistical integrity” would appear to an accurate description.
I suppose Wegman’s assertion also applies to
How inconvenient.
#56 Tempering this critical perspective somewhat, it is important to underline that dendroclimatology is an immature, emerging science, where hypothesis generation (not testing) is a central activity. The discipline will not mature until it embraces experimental ecophysiological approaches to calibrating tree responses to T, P, soil, light, etc. Until that time it will continue to operate in hypothesis generation mode, where data are interpreted a posteriori. This is legitimate science. It is just not a sound basis for trillion-dollar global environmental-energy policy.
You have to walk before you can run and dendroclimatology is still at the crawling stage.
re: 58
bender,
I am compassionate to immature science… up to a point. I decided to look into your claim that dendroclimatology was an emerging science. Here are some of the references I found:
Articles
* Douglass, A.E. 1920. Evidence of climatic effects in the annual rings of trees. Ecology 1(1): 24-32.
* Wilson, A.T., Grinsted, M.J. 1927. The possibilities of deriving past climate information from stable isotope studies on tree rings. Bulletin
* Schulman, E. 1938. Nineteen centuries of rainfall history in the southwest. Bulletin of the American Meteorological Society 19(5): 211-216.
* Fritts, H.C. 1971. Dendroclimatology and dendroecology. Quaternary Research 1: 419-449.
Books
Tree Rings and Climate by Harold C. Fritts, 1976 Academic Press, New York, NY. 567 pp.
Some of these authors are probably dead by now. It appears to me the science has been around long enough to prove they have something or not. I would call dendroclimatology an immature pseudoscience.
People have asked for RE, R2 and CE step by step explanations.
There is a ten or 15 page summary starting here
http://books.nap.edu/openbook.php?record_id=11676&page=83
Hope this is helpful.
Re 39
Provided there is such a sweet spot (which is unproven), there is a slight problem with your suggested methodology. As soon as there is an appreciable climate change your selected tree will no longer be in that sweet spot. So what you are saying is in effect that dendroclimatology can determine climate change provided there isn’t any.
bender is well able to speak for himself. But there’s no need to paraphrase his remarks. Yes, climate change will alter the circumstances of an individual tree – something that traditional dendroclimatology ignores; and I’m sure that bender agrees with his point. But I don’t understand him as even being committed to the idea that dendros have the tools to measure temperature in the absence of climate change, simply because all kinds of things are going on with a tree as well as temperature. There are layers and layers of problems – I realize that we may well be in agreement on this and speaking at cross-purposes.
#58 That’s a silly argument. I could cite Arhennius. Would that mean the science of GHGs was settled more than a century ago? How many papers would you like me to cite to indicate that the statistical and ecophysiological dendroclimatology is stilll evolving? 1000?
#61 Correct, and many a dendroclimatologist (and every ecophysiologist!) understands this – that the tree never sits in the sweet spot as long as climate is shifting. It means dynamic approaches need to be taken to estimate sensitivity. It’s just that this is not a simple statistical problem. If it were, it would be solved by now. Again, addressing #58.
#62 Yes on all counts. And it bothers me when folks argue at cross-purposes. My point was simply a counterpoint to John A’s dismissive over-generalization. – snip – They require context. Climate recons have “statistical integrity”. The question is how robust they are and whether that degree of robustness is sufficient to serve as a driver of global policy. That’s where skepticism is legitimate.
snip . When it is on the topic of dendroclimatology I will call him on it.
Steve – No need to backbite on other posters after you’ve made your point.
fred #60, Thanks, I’ve downloaded the entire book — the title seems relevant. In the mean time, I have refreshed my knowledge on the basics of R2 and found a description of RE as used by climatologists. Tentatively, those look pretty weak regarding dendrochronology applications.
Also, searching based on Ron #59, found some dendrochronology introductions at Amazon — I ordered one.
I agree with Steve M’s #62, it seems that many-many things could impact tree ring data besides temperature. So, I will be interested to dig deeper and see if dendrochronology genuinely accounts for those extra possible independent variables in an objective way. Regardless how it turns out, dendrochronology looks like a fun topic to study.
#63 asks:
What “extra possible independent variables”? “Account” how? The answer can be yes or now depending which studies, species, site conditions, etc. you include in your analysis.
The whole idea that Esper and Jacoby and D’Arrigo were tryng to relay is that if you choose the right trees on the right sites you will get a strong, near-linear, univariate response. This is true. If and only if you know what “right” means. And if you can do this a priori then there is no need to include “extra independent variables”, as their contribution is so small that ignoring them does not substantially reduce variance explained. Including non-significant predictors needlessly erodes your ability to estimate with precision the effects of the known drivers.
Not to discourage you from researching the topic. Just to say: the scientific community already knows the answer to your question: choose sites where the influence of these other drivers is known (from theory, from facts) to be negligible.
I think some thought should be given to the future. Presumably, at some point when the climate goes back to 1970 temperatures, everyone will realise the errors that have been made and come looking for a scapegoat.
As it stands, the IPCC scientists will be able to say “We put our science out to peer review and had no dissent. Everyone agreed.”. They will be able to say that because Climate Audit is not a recognised scientific journal. It therefore seems important to me that each of these findings is written up and submitted to such a journal.
I imagine that this would involve a lot of work, and, given that the journals would probably behave like Nature and reject the article on spurious grounds, the work may be nugatory. But at least the effort should be made, and the rejections documented, so that the auditors of the future may have something to get their teeth into…
re: 63
bender,
It is not a silly argument at all. The dendros have had plenty of time to realize it is impossible to separate the confounding factors of temp, precip, fertilization, etc. when looking at the width of a tree ring. If the science was only 15 or 20 years old, one might be a little more understanding and compassionate – although I would not be among them.
The GHG science has a little more going for it in that we have had rising temps for part of the 20th century – not for all, but enough that people tend to overlook the divergence from 1945 to 1975. Of course, it is getting hard to ignore the divergence now. We have had lots more CO2 go into the air since 1998, but 1998 is still the warmest year on record.
If you want to talk about immature science that is still developing, I would point you to scientific forecasting. One of the leaders is J. Scott Armstrong. See http://en.wikipedia.org/wiki/J_Scott_Armstrong It may sound a little like scientific crystal ball reading, but they have been publishing their own specialized journal for about 25 years now. And now they are up to four different journals. I think this is a branch of science that will bear significant fruit in the years ahead – unlike dendroclimatology.
re: 65
bender,
Your phrase “the influence of these other drivers is known (from theory, from facts) to be negligible” is not supportable. Much of what is “known” in dendroclimatology is simply untrue. If dendroclimatology is ever able to reassert its credibility, it will have to begin by cleaning its own house and disavowing the flawed works of Briffa, Ammann, Wahl and others. Don’t hold your breath.
58 Ron Cram,
Immature may mean unsettled or a lack of consensus on the science. Here are a couple of links to a tree expert that doesn’t have a dog in the hunt.
Effects of Ozone and Climate on Ponderosa Pine
I haven’t read the whole article on this one but the abstract is interesting.
Variable Selection in Dendroclimatology:
(hope the links took)
Sorry I left out an interesting quote I wanted to include in the first link:
The second link deals with statistic problems encountered by dendro’s except of course paleoclimatologists.
#63,
You say
From Statcom_08
Given the lack of data quality as has been presented over and over on this site as it relates to climate reconstructions, I would disagree with your assessment that they have “statistical integrity”.
Another reference from a different publication states
Again, given the “unsoundness” of some of the statistical methodologies applied by various climate scientists when dealing with proxy based reconstructions, I again would have to disagree with your assemssment that they have statistical integrity.
Perhaps I am misunderstanding exactly where the statistical integrity you refer to originates from. Certainly not from the data, nor from at least some of the methodology(s).
And a follow-up…
National Statistical Service
Perhaps those from the CSIRO should have taken some of these to heart.
Bender:
I’m baffled as to what counterpoint you were making. I quoted Wegman on the statistical integrity of Mannian analysis as it pertains to a posteriori selection of statistical methodology to recover what they already believe to be in those tree ring records.
I make no claim specifically as to whether dendroclimatology as a whole is fatally flawed since it cannot separate climate variables (although others may be more certain than I), but I do know that that particular field has yet to deal with the statistical nuances that Steve has been talking about in relation to autocorrelation and the peculiar behavior of time series thereof.
I simply refer to the Hockey Team’s abuse of statistical methods and especially benchmarking which have no integrity.
I do not see people in the field of climatology stepping up to the plate and swinging at these transparently invalid methods, although a lot has changed in climate science since the blog started, so who knows what will happen.
That’s fair.
Bad methods tend to die a quiet death of underuse. If you’re looking for celebratory fireworks as a sign of revolutionary progress, you’re looking for the wrong signs. The fact that only Team members use and choose to defend these methods tells you something.
I keep telling you – and you keep ignoring the fact – that these problems are a serious challenge. They’re not a joke. You and Steve and I can point to the problems, such as autocorrelation, all we want. For some of these there are as yet no known solutions. snip
Some good things *could* grow out of the “PR” challenge – if the proponents let it happen. Steve may be shunned from that group, but his arguments are not being ignored.
Steve: bender, surely there are legitimate causes of complaint. Texas sharpshooting (and I acknowledge that you’ve used this phrase for some time) is pretty deeply ingrained among these folks in a variety of ways and that sort of stuff has nothing to do with legitimate statistical conundrums.
bender #65,
Thanks for the response. Actually, I presume “the answer is out there” — and I will find it if I look.
FWIW, I’m: a scientific newcomer to the AGW field, political independent, moderate environmentalist. What I have found (so far) is that both sides of the AGW debates have plausible arguments for their positions (at least on the surface). However, the pro-AGW-mitigation camp is asking us to spend trillions — thus, the burden of proof resides with them (the philosophy of Climate Audit, I believe). It sure would be great if I could simply read a book (or IPCC report), and get the facts in that one place. Unfortunately, ideology seems to color many presentations. Hence, I feel I must determine the black and white facts for myself by digging beyond the surface of the highly publicized sources.
FWIW, among the Climate Change & AGW topics I am studying in parallel, I am looking into dendroclimatology and dendrochronology. In the end, we may agree regarding dendroclimatology and its application to AGW. Right now, since the burden of proof lies with the pro-AGW-is-bad camp, I view that camp’s dendroclimatiological arguments with more than normal scientific “skepticism”. On the other hand, I view the anti-AGW-is-bad camp’s and “auditors'” arguments with normal scientific skepticism.
Steve: I have said over and over that, if I were a policy maker, I would defer to the advice of the academies etc. I don’t suggest that policy makers do nothing. However, I don’t want people to discuss policy here or this sort of issue.
The basic problem in dendroclimatology is that various assumptions are made but can rarely be tested. It is assumed that response is linear, that conditions around the tree were constant, that the same limiting factors existed etc etc. But you only have a short instrumental period (the calibration period). As soon as you go back beyond that, the validity of the conclusions is entirely dependent on the validity of the (untested) assumptions. In the few cases where data is available, it is rainfall (e.g. outflow of the columbia river over 400 yrs is predicted by tree rings) not temperature that seems to be valid. When the studies were used to generate hypotheses, no problem, but now it is asserted that the temperatures 1000 yrs ago are validly predicted by these trees, with no testing of assumptions.
Just a few more general comments before we get back to statistical dreck, TSF, and process manipulation.
1. It seems reasonable to think of dendroclimatology as immature in terms of knowing what it’s about, rather than just chronological age. A good example is this week’s press release from AGU on a new study:
Oh. Not temperaure?
2. Leaving aside the statistics and looking at the conclusions of the paper, how reasonsonable does it seem to paleocimatologists that the global mean temperature over 900 years (1000-1900 AD) did not vary more than 0.15 °C plus or minus? Even if you think the 20th century is “contaminated”, it varies by more than 0.4 °C by the ’50s, so even before the heavy Carbon Age.
3. In that regards, another recent paper cited in the AGU press release states:
So current temperatures may be a bit higher than the MWP (without conceeding the point) but fall well within the range of the past 10k years. Does that tell us anything?
See the press release and citations here.
re: 76
Craig,
I agree with you completely.
re: 73
John A,
I agree with your thought here. Science is supposed to be self-correcting and the history of science is full of stories of controversies and animosity among individual scientists as they argue for their own positions. We do not see that in dendroclimatology. bender’s comment that we should not expect to see this is completely off the mark, in my opinion. If I was a dendro and believed in my science, I would speak out against both incorrect methods and wrong conclusions that become wrong assumptions for the next researcher.
This epitaph does not help the science, but in an odd way it seems appropriate for final error bounds.
Here lies Lester Moore.
Three shots from a 44.
No Less. No more.
I’d just like to extend a warm welcome to Caspar, Eugene, Gavin, Mike, Ray, Rasmus, Ray, Stefan, David, Thibaut, William and all the RC ‘team’. Also, supporters Josh H, Tim L, Michael T, Tamino, Ray L, Hank R, Lee, Dano et al. Don’t be shy – we know you are watching! We welcome your contributions to this discussion. Come and join the fun. We are missing you!
With respect to bender (“Don’t expect celebration”) vs John A (“nobody is taking a swing at transparently invalid methods”)…
I’m reminded of a hard lesson I learned years ago, when I had to do battle with a provably-wrong and over the top building code inspector. I took my complaint to The Boss. Even being as diplomatic as I could be, he still defended his employee. However, he also was diplomatic in telling me that I was gonna have to come up with an airtight case to get him to go against his own employee.
It took a ridiculous amount of work (but this was my home and my sweetie’s new dream kitchen at stake 🙂 ), but I developed my case. The Boss saw the truth. The inspector went ballistic (sadly, in a public region-wide forum) and ultimately was let go.
Bottom line: it’s a lot harder to effect change from the outside, because insiders must assume their coworkers are probably right. Imagine how demoralizing to discover that teammates are doing poor work!
In that sense, this is why it is triply valuable when Steve is able to write things up for publication. The truth is no different when blogged or published, but the medicine tastes better swallowed from a GRL or Journal of Statistical Climatology spoon. (And yes, it is can be a royal pain to get the prescription approved 😉 )
While we experimented in the early days with names for the Team – there were obvious suggestions like Flame and Heat – I sort of like Texas Sharpshooters. I think that their costumes should be more urban cowboy than rodeo, something along the line of the one below, the ersatz Bollywood interpretation capturing Ammann’s statistical style rather nicely, I think.
Steve,
I love your sense of humor! That’s hilarious!
#82 Yes. Compared to other fields dendro is a quiet science that does not have a culture of progressing by “taking swings” at invalid methods. To expect it now is silly.
#79 You misinterpret my remark, that’s why you disagree with it. There is more than one path to self-correction. The paths range from revolution to evolution. The bcp boondoogle was on its way to being debunked, with or without CA. [Yes, I’m sure you disagree.]
You agree with Loehle’s #76 as though there was something revelatory in that statement. All dendros knows that their linear models are approximations. Yawn. So reconstructions have uncertainty in them. Does this mean a field is corrupt? Get a grip. And note Loehle’s careful choice of words. The fact is that soemtimes assumptions of moisture, temperature limitation are tested. Sometimes they do experiments. Sometimes they do independent sampling. Not everyone is addicted to uncalibrated bcps.
#77 What part of “expect different responses on different sites” do you not understand? Treeline and desert are expected to experience temperature and moisture limitation respectively. Everyone knows this, but you pretend this is news. There’s probably a reason why people invent such straw men.
Before pretending to be an authority on a subject the least you can do is read the blog and see if what you’re trying to say has been said better before.
CA is at it best when it focuses on analysis and its worse when it devolves into wars of opinion.
My point – to bring this back to the thread title – is that generating hypotheses requires some “texas sharpshooting”, aka a posteriori analysis. The hope is that you eventually go beyond this, to hypothesis testing. Dendros need to do more of that, as #76 (and Wegman) argues. But think about what this means – testing these infernal assumptions. Who cares to grant me $1M and 5000y to grow ancient bristlecone pines under controlled greenhouse conditions? Thought so.
Does anyone here think about what it means to “test assumptions”? Or is the pile-on an involuntary uncontrollabe urge? There are reasons why these assumptions are often not tested.
Please try to be more thoughtful in your criticisms.
#83
man, looks like he’s “wingin’ it”.
#85. bender, for what it’s worth, I’ve found the thoughts of Greene on data mining in an econometric context useful in thinking about “a posterior analysis”. Like you, Greene notes that you form hypotheses from looking at data. Then the conundrum comes in whether you can also apply the data used in forming the hypothesis to proving the hypothesis. Greene observed (and I’ve used this illustration in presentations) that one way of testing an economics hypothesis, if time isn’t important, is to wait 30 years and see if it holds.
I’ve observed that there are ideal circumstances to do this at very low cost in paleo, especially bristlecones. No need to wait 5000 years. Bring the Graybill chronologies up to date and see first if you can replicate them and second if they record recent global warming. We’ve proved the Starbucks Hypothesis – it is neither expensive nor time consuming to update the proxies.
Given that we’re pretty much on the same page, I’m not sure why I’m belaboring the point. I’d better go watch the Olympics.
As to the BSP boondoggle being on its way out – I don’t think that we can exclude the possibility that our criticisms may have ended up prolonging its life in the paleoclimate community. Mann’s PC1 has been used more by third parties AFTER the problems were identified than before; it’s as if the paleoclimate community is showing solidarity with Mann because he’s been criticized by outsiders. Sort of like tribal behavior all over the world, where cousins feud with cousins, tribes with tribes, but if a foreign invader appears, they forget their feuds. It’s understandable in human terms, but pretty pathetic when it’s endorsed by IPCC.
re: 85
bender,
I am sorry you feel like people are piling on, but I am not sure you understand my criticism yet. Your assumption that bad methods will die from underuse is fine. It may even be true once the Team have all retired. But that hardly solves the problems of dendroclimatology. Sloppy methods have led to bad conclusions. Bad conclusions are now “science” and are assumed to be true by every researcher coming after.
You say dendroclimatology is a “quiet science” and therefore not combative. That’s the problem! The dendros do not seem to understand this. The science will never be robust. It will never have integrity and command respect if they do not police themselves. If you want to make an omelet, will you use two good eggs and one spoiled egg?
Yes, I understand the differences between treeline and desert and the limitations of temp and moisture. But these are not the only confounding variables. When a dendro can point to a tree ring and say “Based on this ring, formed in 1657, we know the annual temperature for the region, the precipitation, the natural fertilization rate and the amount of sunlight the tree received,” they have got something!
According to the literature I’ve read, there is more than one way a narrow or wide ring is formed. A narrow ring could happen in a warm year with lots of precipitation, if it was a year the tree had to deal with insects or disease. All of the dendros know this, but they pretend it doesn’t matter. It does matter. Until someone challenges the assumptions, provides a public dunking to dendros doing to poor work, and otherwise cleans up their own house – dendroclimatology will never be science.
It is time for the dendros to justify their existence.
Steve: Ron, let me referee this little food fight a bit. I have no problem with dendros collecting data even if we’re not sure right now exactly what it means. Maybe patterns will emerge. In the scheme of things, it’s very cheap data to collect. Having said that, the very difficulties in interpretation place all the more onus on the dendros to archive their data in case a later interpreter can find a pattern that they can’t. For example, let’s say that a chronology is “screwed up” as a temperature record because of recurrent attacks of spruce budworm. Under Jacoby rules, that data gets thrown out because it doesn’t contribute to the story. But it would be just what the doctor ordered if you were studying spruce budworm patterns – and, who knows, maybe that might contribute to disentangling other information. The dendro data sets are big complicated data sets. They are interesting statistically and it’s too bad that bright young statisticians work on far less interesting data sets. If I’d been organizing the Paleo Challenge workshop, I’d have invited a lot of statistics grad students and post-docs as well as the dendros and then asked the dendros to describe their data, all the problems and issues; so that maybe some young statisticians with a clean slate looking for interesting problems would get interested.
Re: #85
Bender, I thoughtfully submit that part of the problem and frustration is not calling a conjecture a conjecture. Conjecturing isn’t bad within itself, but it is when called or attempted to be passed off as something else. Also data mining for hypotheses to test makes for complications when doing the statistical testing, and further, the “texas sharpshooting” appears to me to be different than mining data. How many scientific papers are accepted where the author(s) admit to “texas sharpshooting” and conjecturing and even more frustrating how often do reviewers of papers where these processes are used, but not labeled, label them as such?
I went over to look at the link to the “Texas Sharpshooter Fallacy” at the top of the article and it reminded me of another, similar fallacy which may have a name but which I’m not aware of. I call it the “Busy Store Fallacy.”
Say you get talking to the owner of a small store and she begins complaining about how bad business is. You retort, “Oh, come on! Almost every time I come here there are lots of customers.” The fact is that both of you can be right. There are two reasons you may think the store is busier than she does. First you, like most people, are likely only to come at certain times of the day or week. Thus the store is going to be busier when you come. But even beyond this, there are going to be natural clumps of customers. And if there are N people who come during a particular clump, then there are going to N people who see a busy store. There may be another M people who come in by themselves (between clumps), but N/(N + M) people are going to report a busy store even though it was really only busy once. Further there will be times, perhaps a large % of the time when there will be no customers and this will skew even more the opinions of the store owner vs a typical customer.
It might be fun for someone to draw some graphs showing the % of people claiming a busy store vs the actual degree of business. Of course a definition of just what constitutes business would be necessary and it would also be necessary to make assumptions as to the form of the distribution of visits.
Folks, as always, I urge people not to over-generalize. We have here a particularly egregious example of Texas Sharpshooting, one that interests me because I have a personal involvement in the dispute. And yes, Texas Sharpshooting is a problem with Team paleoclimate studies. But not everything in the world is Texas Sharpshooting. When people try to go a bridge too far, as readers often do, all it does is generate easy ripostes for critics of this site. They point to the exaggerated claim – one that I didn’t make – and then use that as an excuse not to consider the issue that prompted the post.
good contrast in helpful vs. unhelpful rhetoric:
#88
#89
#89 is a balanced assessment. #88 is over the top.
#87
Solidarity in support of a person is not the same thing as acceptance of an invalid method. The objective scientist will agree that the untenable PC1 demon (aka Chucky) needs to be exorcised. When the “young dendros rebel”, one of the things they rebel against is the unmitigated use of Chucky in support of policy. That the policy makers have made Chucky their favorite pet is not the young rebels’ fault; it is beyond their control.
#92, bender, you’re being a bit unfair to the policy makers here. They are working with what they are given. Houghton’s backdrop at the WG1 press conference was the HS. Using the HS as a crutch seems to be common practice among concerned scientists.
In my opinion, as I’ve said many times, I think that too many scientists under-estimate the public. If it were up to me (and I suggested this to IPCC AR4 scopers), they should go down the throat of all the technical issues – all the CO2 lines, all the water vapor lines, whatever. Put their best science in a place where people can look at it. Stop saying that its’s Met 101 or Atmospheric Radiation 101 or whatever. Or if there are text expositions that they endorse, they should state the endorsed expositions and discuss what a scientist from another field should look for in those texts.
By using the HS and HS-type arguments, scientists have taken a bit of an easy road, treating the public like pawns. Their argument, I suppose, is that no harm is done, because even if the story for the public isn’t exactly right, there’s a story known to the illuminati that is correct and gives the same answer. IT’s a bit like saying it doesnt matter if WMD was wrong, because there was another good reason. Maybe so. But that obviously doesn’t justify the original use of flawed “facts” and the public eventually catches on to such things. The whole anti-MM thing feels exactly like that.
You’re right; I need to clarify. It’s not the policymakers per se who are at fault, but the science promoters that sit in between the scientists and the policymakers. They are the ones who systematically suppress scientific uncertainty for fear of muddying the waters. I tend to call them “policymakers” although they are more like science-policy middlemen: senior science editors who don’t do science anymore and junior policy analysts. Yes, the promoters use what they were given in 1998, convenently don’t ask if anything’s changed in 10 years of research. Will Chucky resurface in, and survive beyond, the 5th assessment? If Chucky lives, it’s the middlemen who can be blamed. Is that unfair?
bender says:
“Will Chucky resurface in, and survive beyond, the 5th assessment? If Chucky lives, it’s the middlemen who can be blamed. Is that unfair?”
They have gotten away with it for 10 years so Chucky will not die unless the entire AGW premise is discredited by a climate systems addicted to chaos. Science self corrects but sometimes the old guard needs to die off first.
Raven, 95:
I think the matter will finally be settled by straight physics.
“The matter” is whether CWP is “unprecedented” compared to MWP and HTO and PETM. “Straight physics” is not going to settle what is an empirical (and paleoclimatological) problem. That is, unless your “straight physics” includes time machines. In which case, you could be right.
bender,
I know that there are comparisons of the current release rate of CO2 into the atmosphere with that of the PETM, but surely no one is saying current global or polar temperature now are in any way comparable to those during the PETM. There were boreal forests in Antarctica then vs miles of ice now. Comparing the CWP to the MWP, Roman WP, Holocene optimum and Eemian and other recent interglacial temperatures are reasonable things to do, if the data were reliable. Did I miss the sarcasm tag?
bender: Sorry I wasn’t clear. By “matter” I was only referring to the effects, or lack thereof, of the so-called GHgs.
#98 The most relevant comparison is MWP vs CWP because this is the one paleoclimatology has the greatest probability of resolving. MWP is what the sharpshooters said must be erased and has been erased.
#99 Which, as usual, has nothing to do with the topic, which is texas sharpshooting.
Steve,
re: your comment on my #88,
I should not have used the more general term “dendros” when I was thinking specifically of the dendroclimatologists. I assumed everyone would know my meaning from the context. Of course, I agree with you that there is no problem in collecting data. I consider this the work of dendrochronology, which is not the same thing. And of course, more bright statisticians should be involved.
My problem is with dendroclimatologists. They make claims for the science that are not supportable. This has all kinds of ramifications. I’m not just talking about global warming. Put yourself in the position of a young student who is convinced to get your Ph.D. in dendroclimatology. After getting your degree, you begin to evaluate all of the un-examined assumptions. You begin to see how bad methods have led to wrong conclusions and further research headed in the wrong direction. What do you do now? Some of these people may believe in the science and try to fight for it (but I haven’t seen one of these yet). Others will go along with the crowd because it is easy. Others will chuck it and quietly look for a new career. This is a terrible state of affairs.
Steve, I understand that you may not agree with my comments. But anyone who attributes my comments to you are simply looking for a reason to discount your site so they do not have to deal with the facts. If it were not my comments, it would be someone else’s.
Steve: I still think that you’re being too strident. Dendroclimatology originated out of the study of droughts in the Southwest and arguably they can accomplish something in that area. Ed Cook’s work on droughts seems sensible, for example. But they want to get to temperature and that’s where things get a lot hairier. Enter, in particular, Jacoby/D’arrigo and Briffa, who start down this road, path now being trod by D’Arrigo, Esper and Rob Wilson. Again, I encourage you to retain nuance.
re:92
bender,
I stand by my statement that it is time the dendroclimatologists justified their existence. In fact, if I could find the time, I would like to follow Pat Frank’s lead and write an article for Skeptic on this very topic. Who knows? Maybe Pat will even agree to team up with me on the article.
Re 97
I would say that “the matter” is very easily settled with respoect to the HTO not to mention the PETM. The CWP is not nearly as warm. With respect to the MWP I think that the historical record suggests that the CWP is slightly cooler, but I’m not dogmatic about it.
#103 Can you cite a statistical analysis that includes correctly estimated uncertainties? Does IPCC? No need to be dogmatic about any scientific proposition. If it’s a known fact, it can be shown with a citation; rhetoric is unnecessary. If it’s not a fact, or can never be proven, then it’s propaganda. You see that sometimes.
I’ve added another example of Texas sharpshooting in my Replicating Ammann post, one that I just noticed. They do not calculate calibration RE and verification RE on the same series. When we had our Nature correspondence, Mann hyperventilated about us doing something VERY VERY WRONG in our calculation of verification statistics, by missing this strange and undocumented splicing procedure. Mann calculates the calibration RE on his “dense” network of 1000+ gridcells in the 1902-1980 period, but the verification RE on his “sparse” subset of 172 cells. What happens if you are consistent and at least, for the record, calculate calibration and verificaiton statistics on the same thing? Something that seems particularly desirable if, like Ammann, you believe that these two ratios have some meaning. Well, if you do this, the calibration RE falls to 0.177, failing Ammann’s redneck voter test.
Another, older example of good old-fashioned rootin’ tootin’ sharpshootin’ cherrypickin’ pie:
A few good men
#105,
There are some sparse vs. dense issues in figure 2 in MBH99 as well,
http://www.climateaudit.org/?p=647#comment-103485
Seems to me that Mann’s sparse reconstruction,
http://www.ncdc.noaa.gov/paleo/ei/ei_data/nhem-sparse.dat
is done without TPC variance matching, but dense
ftp://ftp.ncdc.noaa.gov/pub/data/paleo/paleocean/by_contributor/mann1998/nhem-dense.dat
is with variance matching.
Variance matching affects REs, so we have even more choices 🙂
#100 Bender
The MWP has not been removed. It is a matter of thankfully very well documented fact and so therefore despite the attempts of certain cherry picking ‘sharpshooting’ Team members can never be erased. They can continue to attempt to try to discredit it by referring to it as only having occurred in Europe or the Northern Hemisphere it will nontheless remain an historical fact. The purveyors of the new eco-religion can try to deny its existence as much as they want and the fact that there is more than enough historical evidence that it was at least as warm (more likely significantly warmer) during this period of the millenial past than it may have been during the recent warming period at the end of the 20th century.
The attempts by the HT to remove it constitute the greatest example of ‘sharpshooting’ (or as we would say in the UK ‘moving the goal posts’) that has ever occurred to support an agenda in man’s history IMO.
KevinUK
Steve # 83
That cowboy sharpshooter picture will have to be erased. The man has his mouth open and that increases the probability of danger. He might be speaking.
Or, as one of my Aussie buddies once said: “He didn’t just shoot himself in the foot, he deepthroated a bazooka!”
My jaw dropped when I read this quote from Tamino:
IOW: “I know what the signal I want to see so anything that does not show it must be noise”. I always found hard to believe that the Team was intentionally manipulating data but I have often wondered how they managed to justify their actions in their own minds. I now know.
#11 open mind => addled fundamentalism
#111 — oh well. Tamino appears to have blocked my reply. I would love for him to explain how he knows what is “signal” vs “noise” in the raw data, when we’re dealing with unknown sources of “signal.”
There appear to be some questions that Must Not Be Asked.
(I’ve responded to dhogaza as well. I was saying precip and temp are not correlated, he shifted that to “not related”. Of course, there’s a complex relationship, as highlighted many times here. The hard part is how to describe growth as a function of temp without precip getting in the way as a separate variable… particularly in precip-limited places like the Nevada desert. Ah well.)
It’s really all the same problem, when you get down to it.
Goal: temp proxy.
Reality: a variety of known and unknown factors influence the phenomenon being measured.
Challenge: how to express the phenomenon as a function of temp and validly eliminate all other factors in the expression.
If you can’t do that, then variance in the other factors affects your signal.
AFAIK, what we’re seeing is people matching up their signal to a portion of modern temps, and assuming that means they’ve eliminated the other factors.
Kinda big assumption.
#113 – He refused to post my reply too. I must give the guy some credit for being a master of propoganda because he recognizes when posters are bringing up issues that he cannot possibly reply to so he accuses the poster of smearing ‘honest scientists’. This gives him the cover he needs to delete further posts on the topic. I am thinking of proposing a web blog award for the most ironic blog title – tammy would win hand’s down.
I have never posted on Climate Audit before, and I had not intended to either. I read this site fairly frequently, but I never had much to contribute. Considering people here are obviously interested in what I effectively caused, I thought I would post a comment here. I submitted a post a short while ago which is still in the moderation queue. Two of my posts before it disappeared, and I imagine it will too. (I am not particularly fond of shadow posting, but in this case I want some record.)
Unfortunately I did not make copies of the deleted posts. The best I can manage is a paraphrase of the second deleted post. The wording is different, but the content should effectively be the same.
I wholeheartedly support open and honest dialog. I attempted to have it at Open Mind, but it seems that will be impossible. I have now had several posts deleted without any explanation. These posts did not violate any rules of the website, nor contain inappropriate material. These deletions have effectively misrepresented me. I apologize for spending bandwidth on this as I am not sure it belongs here, but I thought it might be of some interest. And to be honest, I feel some urge to defend myself.
#115 – Here is the post I had deleted. As you can see the is nothing rude about the tone other than the fact that I stated quite strongly that I disagreed with his opinions.
I would modify you comparison to state:
Steve may need to correct me if I am wrong but I believe that is exactly what the Team is doing with their samples.
I dislike your changes Raven. They greatly modify the meaning. It may make the comparison more accurate for this thread, but not for anything that was being discussed where this originated.
In the irony stakes I think it is a close run thing between Open Mind, Real Climate and Fair and Balanced.
Tamino has selectively deleted my posts in order to misrepresent me. I quit posting there in frustration. That mind is open indeed, and the sight is not pretty.
==================================================================
To keep an open mind you have to sharpshoot denialist postings. Credit to the sharpshooter for recognizing devastating critique.
#118: Remember what “Pravda” means? “The truth”
Re #115
You have to be very careful posting to Tamino’s site, it seems to have a very aggressive Spam checker which consigns posts to some imaginary list, from which they never return, for containing the names of certain drugs!
I had the experience of having a post multiply rejected, detailed investigation revealed it was because I used the word ‘ambient’! What’s wrong with that you ask? It was rejected for containing the name of the drug ‘Ambien’!
Phil, three different posts of mine have vanished in less than 12 hours. Immediately after one post was accepted. The three posts which were deleted each showed Tamino to be wrong, while the accepted one barely said anything. There can be no doubt Tamino intentionally deleted posts that would be damaging to him.
snip
OP:
I have decided. This is the very definition of sharpshooting – drawing your target (i.e. benchmarking) after you have shot (ie. run your data through the analytical sausage-grinder and previewed the outcome). Some would call it “cherry-picking”. However these cherries weren’t hand-picked individually. They were selected algorithmicially, which helps sustain the illusion of “hands-off”.
Phil, Tamino’s site tells you if your post is waiting for the moderator. If it is, Tamino is the one deciding what stays, what goes. Here’s my current contribution, waiting for moderation. I have better things to do than try to poke through a one-sided moderation queue and do safety-postings here. Time to get back to the Real World.
2 questions, Steve —
First, what is the actual percentile for validation RE = 0, if not .99? I suppose this varies somewhat with the MBH “step” involved.
Second, how do WA get negative calibration RE’s? According to the NAS North report cited by Fred above in #60, “in the calibration period, RE, CE and r^2 are all equal.” But the calibration r^2 (aka the regression R^2) must be nonnegative, and can only be zero with probability zero. The adjusted R^2 can be negative, but its expectation under the null of no explanatory power is 0, so it’s negative about half the time, not 7/1000 of the time as here.
Is the discrepancy because WA (and MBH) are in fact computing RE in terms of temperature errors and mean temperatures, even though MBH in effect are regressing the proxies on temperature? Regressing proxies on temperature as in MBH is the correct way to go, even if they inefficienty ignored the covariance matrix across proxies and therefore could not compute correct confidence intervals. However, it would then be more appropriate to measure the ability of the model to fit the proxies in the validation period rather than its ability to back out temperature, as they apparently have done.
What a long thread, thanks for the good work.
Please friends, stop debating dendro with bender. New ideas are not being added, just repeated.
If the hockey stick is a phony due to low statistical significance, what is the next best chart. I mean, is there better data from Ice cores or silt layers or something?
127 Kevin
How about Craig Lohle’s?
Just read “A 2000-YEAR GLOBAL TEMPERATURE RECONSTRUCTION
BASED ON NON-TREERING PROXIES”.
Thanks Pat, thats very good. Lying is a strong word, but somebody seems to be on the borderline. Now I must go to some site with the opposite argument. One where the hockeystick is valid and McIntyre et al are cast as paid oil thugs. Are there any good (science not politics) ones out there?
#129
You could try the UK Met office web site, they have consulted their wet seaweed, and all seems to be OK with the hockey stick.
Mind you their forecasting record is appalling, any UK Met office forecast beyond to-morrow, you would be advised to treat with caution. Such is the nature of the UK climate.
This summer is appalling, we are meant to be in severe drought, that is what the climate change models predicted in 2006, and it has rained ever since that forecast was made.
Interesting the amount of data being pulled in at Tamino’s to suggest that other factors are not independent sbBCP growth factors. Perhaps we can create a catalog one of these days. My latest response (it’s getting easier to be brief):
The related big issue with respect to bristlecones is reconciling the Ababneh and Graybill results. I don’t see how anyone can use the Graybill Sheep Mt results without showing why Ababneh’s results are wrong. And if Ababneh is right about Sheep Mt, then there’s something almost certainly wrong with Campito Mt and other related Graybill sites.
Graybill’s own analysis in Graybill and Idso shows a big difference between the strip bark and whole bark chronologies. You and I have a pretty good idea what’s wrong with the strip bark – but it’s crazy that the dendros haven’t analyzed exactly what’s going on with strip bark – even after the problem was highlighted by the NAS panel. You’d think that someone would do a technical report.
But nope, the same tired old re-hashing of Graybill chronologies, which, aside from any other consideration, should be regarded as unusable without re-confirmation simply due to age, missing records and inconsistency with Ababneh.
#132 Agree completely. Choosing Graybill over Ababneh is akin to selecting “a few good men”, i.e. cherypicking.
#133. The “few good men” thing goes beyond cherry picking when it infects the archiving and they only archive “good results”.
Remember that they withheld the Gaspe updated results that didn’t show the HS, which have never been archived or published. I learned of their existence and, when I asked for them officially, Jacoby refused, saying that the earlier results showed the “signal” better. Once they go down the road of selective archiving, unfortunately it ends up eroding confidence in the representativeness of what they do archive.
Steve, you’ve read all these papers. When it is said that we’re selecting for a “good signal” have you seen any definition of what “signal” means? Trying to be as open as possible to whatever process is in place.
#134 True.
#135 “Signal” means correlation with the instrumental record of whatever is being proxied.
Except that they now purport to disdain “interannual” correlation – the despised r2 statistic. There’s a tendency to assume ex ante that the HS is the “signal”.
#137 Really? They disain interannual correlation? What makes you say that? I hope they’re prepared to start dealing
with monstrous levels of autocorrelation. It’s really “pick your poison”.
Craig:
Isn’t that a usual tactic? Saying there’s no clear cause/effect relationship in the first plae, much less one with directionality turns into the person saying there’s not a relationship at all.
We know carbon dioxide is related to temperature.
“You’re stupid, Arrhenius showed increasing carbon dioxide increases temperature in 1896!”
I wonder if some people have ever actually read “On the Influence of Carbonic Acid in the Air upon the Temperature of the Ground”.
Steve:
I wonder why people in the field seemingly aren’t interested in something helpful like this.
Because it is career suicide.
Mark
Don’t know where to put this, Unthreaded is closed, but I think it needs saying.
A longtime friend suicided by hanging the other day. His father had been an eminent medico, much decorated, and the son tried for recognition too. He chose a path of protest and was active at anti-uranium and human rights protests.
We shall not know the full story, but it is plausible that he ended it because of increasing mental agitation that not enough was being done to combat his acquired view of Global Warming.
In this difficult week, I have appreciated more than ever the leadership of Steve McIntyre and his balanced, interesting writing; and to name just one more of many, Steven Mosher for his delightful turn of phrase coupled with evident knowledge. You have all provided a refuge.
It was not sharpshooting, it was the rope. And it was a day or two after I wrote lightly about the epitaph of Lester Moore. There are strange, sometimes savage, twists and coincidences dished out by Mother Nature, not just climate-wise.
#141,
My condolences on the loss of your friend. May he rest in peace.
Thank you, Jonathan. Geoff.
I think you’re probably getting silent condolences from many of the posters. People often don’t know how to react when confronted with such events.
My condolences as well.
#141, Jeez has expressed my sentiment. It is hard to know what to write. Words on a blog are inadequate and seem vaguely inappropriate.
My condolences on the loss of your friend.
I have several good friends who are depressed by the thought of AGW = “man’s ruination of the planet”. Anyone “sounding the alarm” had better be aware of what they could be triggering. A false alarm could prove very costly. (Yes, I know – a failure to alarm in the case of a disaster would be costly too. A familiar message.)
Let’s just get the science right, and let’s talk openly about what the data really say. (The GCMs in particular, and the forcing estimation exercise especially. (The GCMs aren’t the big problem. It’s the estimation exercise.)) RC is too autocratic a forum to be trusted. They are under no obligation to answer any question, and are unnaccountable for the answers they give. Their mandate is clear: subdue all questions that may threaten policy momentum. Dudes, it’s the science. Folks want answers. Some folks need them.
re 141
It’s says a lot about man when he reaches out to thank others in his time of grief. CA has also been a refuge for me.
My condolences on your loss.
Geoff,
My condolences as well. Words fail me.
Wow. Loss of hope is always tragic. Geoff, may you have opportunity to give an overwhelmingly good hug to the family.
Thank you, Steve, for allowing this space which can be closed off now, and to others for kind comments. I have had cause for some deep thought and have resolved never again to use the ad hom, unless it is clearly seen as light and humorous. Different people have different sensitivity to games. Some at RC might take note of such sensitivity.
I might be going mad but I’m sure I saw Steve Mc say something about poetry in a recent thread???
(I don’t know which thread so I’ll put my ridiculously poor effort in a couple of threads. Sorry!)
There was a doctor called Mann,
But his data got him into a jam,
He said the millennium was cool,
But he was shown as a fool,
……Then to the rescue came Casper Ammann.
But McIntyre wouldn’t let it be,
He wanted to see the RE,
And when it eventually came,
The statistics were lame,
But ’twas too late for the IPCC.
McIntyre’s instinct was right,
Ammann should be feeling contrite,
But he’s probably not,
He’ll still say it’s hot,
And the Team will no doubt put up a fight.
So was it as warm in the past?
Do we really need to act fast?
Who knows the real truth?
But in their search for a proof,
The Team’s antics leave us all aghast.
The actions of just a few,
Might probably destroy peer review,
They don’t follow the rules,
Treat the curious like fools,
And make honest scientists cry “mon Dieu!!”
So who should we really believe
If it wasn’t for good blokes like Steve?
We should keep open minds,
And eventually find,
That climate science might be reprieved.
I’ll give you this poem for free,
And in time we’ll eventually see,
That when the science is sound
Steve will eventually have found,
How CO2 raises temperature by 3.
(Not worth a copyright, Terry B August 2008)
After a long period of (stunned?) silence, I note that supporters of the Team over at Tamino appear to think they have dismissed Steve’s demolition of W&A.
http://tamino.wordpress.com/2008/08/10/open-thread-5-2/#comment-21340
So does their argument really have any merit, or is it just the usual handwaving?
BTW – first time poster, long time lurker – love your blog Steve, even if I struggle to follow some of the maths!
Well, the NAS panel said, two issues were brought up “regarding the metrics used in the reconstruction exercise”:
1. “…the choice of ‘significance level’ for the reduction of error (RE) validation statistic is not appropriate.”
2. “…different statistics, specifically the coefficient of efficiency (CE) and the squared correlation (r2), should have been used….”
And said about them:
As I might put it plainly, inappropriate validation metrics were used, so the results of the reconstruction are far more uncertain than claimed. Or in other words, they didn’t prove what they said they did.
Interesting article on an open-science movement in which raw data are being posted on the Internet to speed research:
http://www.boston.com/news/local/massachusetts/articles/2008/08/21/out_in_the_open_some_scientists_sharing_results/
#152 gda
1. y = ax + b + noise (1).
2. y = x + noise (2).
Note that there is some confusion here from Gavin’s pussycat. Eq 1 = Eq 2 when the intercept is 0.0 and a is 1. One would assume that noise would meet the definition of noise. Equation 1 is the general form. Eq 2 is a specific form of Eq 1. R^2 still is good for both. Hard to see where Steve McI, according to pussycat,
by using a general form. Without showing proof that this specific form of Eq 2 somehow should not use R^2, the claim made is more than just suspect. Quoting from pussycat
Let’s see can’t pass a “lightweight” test that is used as a first cut at deciding. Yet the MBH claims “robustness” or some such.
Would like to see that proof of this claim
in the acknowledged statisticians peer-reveiwed literature; and of course how MBH fit the specific and not the general case. It would be interesting. Of course, this is claim of ease is not mine nor MBH, as far as I know.
Not to doubt his claimed ability to cheat. Just that I was taught that this was one of those “lightweight” tests used to help make sure you could support the claim you were making.
Re #152 gda – Whether some of the posters at Tamino’s site have the expertise does not matter. Too many at that site speak from a biased point of view. I would stick to Steve M and Dr. Wegman.
I suppose it’s about time to resurrect this thread. 🙂 After having done some of the Steig analysis, I re-read AW & WA and figured I’d post my thoughts. Not necessarily connected to the discussions immediately above.
.
Steve,
.
My biggest issue with their 2 papers is not the methodology or even the statistics. It’s the whole-body-waving when it comes to the multivariate aspect of the analysis. I will quote a relevant passage from AW (my bold):
.
.
Without being melodramatic, I am seriously struggling to see how any peer reviewer would not demand that this entire paragraph be stricken prior to publication. Did MBH “calibrate” the regional precipitation/ENSO teleconnection (note that WA juxtaposed the precip and ENSO, rendering their statement meaningless) to global temperature? No. Has Mann et al., WA, or anyone else done this subsequently? No. Has anything been published anywhere that suggests that such a generic calibration is even remotely possible? No.
.
There is also an untrue statement (the first bolded statement) about the MBH methodology. The calibration step is not to some amorphous “large scale pattern of climate variability” (whatever that means); the calibration is to temperature. The MBH method does not use bristlecones as ENSO proxies (which is of doubtful plausibility in the first place), precipitation proxies, CO2 proxies (except by accident), or nitrogen proxies. It directly uses bristlecones as temperature proxies. It is not possible to calibrate a proxy to temperature and then claim that it is accurately representing ENSO via an unproven precipitation teleconnection – which, by the way, just happens to be an unquantified proxy for global temperature. Calculate the confidence intervals on that.
.
The middle statement, that this is exactly the kind of thing that MBH takes as axiomatic – rather than supporting the MBH-type analysis – should be considered a devastating condemnation of the MBH analysis. None of the chain connecting local site variations to large-scale climactic patterns and then to global temperature has ever been quantified, and it is not clear that it is even possible to do so. In essence, this sentence admits that the proxies are not good temperature proxies. Instead, they are good proxies for whatever makes tree ring widths change in size, and this is then defined by MBH as representing a climactic signal with zero supporting evidence and no means to quantify this relationship in terms of temperature.
.
I’ll stop there for the moment. 🙂
Ryan, I so enjoyed your comments, that I felt I needed to “bump” them.
Someday, I hope it is in my lifetime, serious scientists are going to wake up and call MBHxx, WA, and AW what they really are and ask, “what were we doing? Why did we put up with this?”
What happens if you apply all the “rules” used (for throwing out sample data in order to get the “99%”) to the actual MBH proxies themselves ?
Would there be any proxies left in the reconstruction ?
Also, i’m guessing that by running this 0.75 ratio cutoff over any non-strongly-rising part of the callibration to the temperature record would give fantastically different results. Why would changing the region of callibration affect the validation statistics to such a high degree ?
Steve, whenever I read one of your critiques of The Team’s math, I become 99% certain that these – snip –
Robustly!
2 Trackbacks
[…] Caspar Ammann, Texas Sharpshooter By Steve McIntyre “The Texas Sharpshooter fallacy is a logical fallacy where a man shoots a barn thirty times then circles the bullet holes nearest each other after the fact calling that his target. It’s of particular concern in epidemiology. Folks, you are never going to see a better example of the Texas Sharpshooter work itself out in real life than Caspar Ammann’s handling of Mann’s RE benchmark. I introduce you to Caspar Ammann, the Texas Sharpshooter. Go get ‘em, cowboy.” […]
[…] We have a strong suspicion that this is the case, but, of course, no proof because we do not know *who* the reviewers of these papers have been. This was the charge made against those editors who published the articles the CRU gang produced. They refused to disclose the reviewers. The emails detail how they made sure “appropriate” reviewers were provided, knowing they would not be revealed. Perhaps now is the time to make this a direct accusation and request (or demand) that this information be made available. They don’t seem to realize this would expose their malfeasance. In order to properly defend the good science it is essential that the reasons for bad science appearing in the literature be investigated. Frightening comment, because only they know what is “good science” and you bully “bad science” by personal attacks. The lever here is that the Subcommittee on Oversight and Investigations of the House Committee on Energy and Commerce is suggesting that your papers are bad science and asking (their point 8e) for the identity of people who reviewed your work. The Committee is investigating the charge they were peer reviewing each other’s work, which was confirmed by the Wegman report. In response, it is completely fair and justifiable to point out that it is the papers that criticize your and related work that are bad science, and that, through the Subcommittee you can request the identities of the reviewers of all of these critical papers—starting with M&M. Amazing! When you respond, there are a number of items that require a direct response from you alone. There are also a number of scientific points where you could give a multi-authored response. Safety in numbers and whose names should appear as authors is a game documented in the emails. Multiple authors appear on many of their articles. There are many people who have expertise in this area and familiarity with the scientific issues who I am sure would be willing to join you (I would be happy to do so). At this stage, however, I would keep the group small. This appears to indicate an awareness of keeping control of the issue. A few others could be added to the original email list nevertheless. I took the liberty of copying your plea and the Subcommittee’s letter to Caspar Ammann, primarily because I think he can help with the scientific aspects better than most people. Amman later tried to ‘help’ but ended up right in McIntyre’s sights and likely regretted getting involved. […]