Caspar Ammann, Texas Sharpshooter

The Texas Sharpshooter fallacy is a logical fallacy where a man shoots a barn thirty times then circles the bullet holes nearest each other after the fact calling that his target. It’s of particular concern in epidemiology.

Folks, you are never going to see a better example of the Texas Sharpshooter work itself out in real life than Caspar Ammann’s handling of Mann’s RE benchmark.

I introduce you to Caspar Ammann, the Texas Sharpshooter. Go get ’em, cowboy.

In Ammann’s replication of MBH, he reports a calibration RE (the Team re-brand of the despised calibration r2 statistic) of 0.39 and a verification RE of 0.48. So that’s his bulls’ eye.

In our original papers, we observed that combinations of high calibration RE and high verification RE statistics were not necessarily “99.99% significant” (whatever that means), but were thrown up quite frequently even by red noise handled in Mannian ways. So something that might look at first blush like sharpshooting, could happen by chance.

In my first post, the other day on this, I observed that Ammann’s simulations, like ours, threw up a LOT of high RE values – EXACTLY as we had found. There are nuances of differences in our simulations, but he got a 99th RE percentile of 0.52, while we got 0.54 in MM2005c. Rather than disproving our results, at first blush, Ammann’s results confirmed them. Mann didn’t appear to be quite the sharpshooter that he proclaimed himself to be or that everyone thought. (This is something that should have been reported in their article, but, needless to say, they aren’t going to admit that we know the street that we live on.)

It’s not that the MBH RE value for this step isn’t in a high percentile – it is, something that we reported in our articles, though in a slightly lower percentile according to our calculations. For us, the problem was the failure of other statistics, which suggested to us that the seemingly high RE statistic (99.999% significant) was an illusion from inappropriate benchmarking – a form of analysis familiar in econometrics (especially the seminal Phillips 1986). The pattern of MBH statistics (high RE, negligible verification r2) was a characteristic pattern of our red noise simulations – something we reported and observed in our 2005 articles.

Obviously, it wasn’t enough for Ammann to show that the MBH RE value was in a high percentile – he wanted to show that it was “99% significant” as the maestro had claimed.

So he re-drew the bulls’ eye. A couple of days ago, I described the two steps whereby Ammann gets the MBH RE score (0.4817) into the 99% “bullseye” but this was my first cut analysis and did not tie it directly to the re-drawing of the bullls’ eye.

Ammann’s first step was to assigned an RE value of -9999 to any result with a calibration RE under 0. That only affected 7 out of 1000 and didn’t change the 99% percentile anyway. So this seemingly plausible argument had nothing to do with re-drawing the bulls’ eye, as noted previously.

The bulls’ eye was re-drawn in the next step – where Ammann proposed a “conservative” ratio of 0.75 between the calibration RE and verification RE statistics. Using this “conservative” ratio, he threw out 419 out of 1000 votes. The salient question is whether this “conservative” procedure has any validity or whether it’s more like throwing out black votes because they couldn’t answer a skill-testing question like naming the capital of a rural county in Tibet or identify the 11th son of Ramesses II. I’ll provide some details below and you decide.

First no one has ever heard of this “conservative” benchmark – and I mean, no one. You can’t look up this “conservative” ratio in Draper and Smith or other statistical text. The “conservative” benchmark is completely fabricated. So everyone’s statistical instincts should be on red alert (as Spence_UK and Ross’ have been and as mine were.)

So I thought – let’s look at the votes that didn’t count. What did the rejected votes actually look like? First of all, if a simulation had a negative RE score, those would fail the test and be re-assigned to -9999. OK, but those didn’t matter, because they were already to the left of the target; the 99% bulls’ eye wasn’t affected by this.

The only ones that mattered were the votes with RE scores higher than MBH which were thrown out on this new technicality. There were 13 votes thrown out on this pretext which I list below in order of decreasing RE score (note once again how high both the calibration and RE scores are in these rejected votes.) Most of the rejected votes had calibration RE values above 0.3, but slightly lower the value of the calibration RE in the WA emulation of MBH (0.39), but the third one in the list had both a calibration RE and a verification RE that were higher than MBH. Nonetheless, the vote still got thrown out. The RE score was too “good”.
For a calibration RE of 0.3957, the maximum allowable verification RE to be eligible would be 0.528! (0.3957/.75). Turn that over in your minds, folks. If the calibration RE was 0.3957, unless the verification RE was exactly between 0.4817 (MBH) and 0.528, the score would be be placed to the left of MBH and the bulls’ eye re-drawn. Redneck scrutineers would be proud.


# Cal_RE Ver_RE

647 0.3390 0.644
944 0.3485 0.620
113 0.3957 0.609
548 0.3016 0.599
374 0.3542 0.550
683 0.2479 0.542
153 0.3826 0.519
146 0.3112 0.514
299 0.3383 0.508
40 0.3176 0.508
194 0.1840 0.502
492 0.3552 0.491
656 0.3284 0.483

Once the above 13 votes were thrown, MBH was declared the winner of the election by 99% of the votes – sort of like a paleoclimate Kim Il Jong.

Let’s think a little further about the “conservative” ratio of 0.75 between calibration RE and verification RE. The one that no one’s ever heard of. Where did it come from? As soon as he saw it, Spence_UK thought that it probably stunk and, needless to say, it does. Here’s how it works.

The MBH ratio in the AD1400 step is 0.813. So any ratio that is higher than 0.813 would cause the MBH result to be thrown out. 0.75 is tucked in just under the value that would cause the MBH result to be thrown out. That’s the first part. (Ammann’s code shows that he tested a variety of cases with values higher than 0.813, but that these ratios would cause MBH rejection is never mentioned.)

On the other hand, if you go to a ratio of 0.5 (also a case shown in the code but not discussed), you don’t throw out enough votes. Only 2 votes would get thrown out with such a criterion and MBH would not win the election.

So 0.75 is pretty much the optimum value for throwing the maximum number of votes out without throwing MBH out. Perhaps this is what Ammann meant by a “conservative” ratio – he’s allying himself with redneck vote manipulation. Hardly what one expects in Boulder, Colorado, but life is strange.

Now there are other issues involved in all of this, such as whether bristlecones operate as a type of radar meterological antennae measuring temperatures in Asia, Africa and Australia. And nothing in this particular dispute affects the “big picture”.

In the past, I’ve sometimes sarcastically referred to the Team as the gang who couldn’t shoot straight. You’d think that if they sent out a Texas sharpshooter gunning for Ross and me, that they’d send out a guy that wouldn’t shoot himself in his own foot. Or draw the bulls’ eye with himself in the middle? But that’s the Team.

Who else could lose a Texas sharpshooting contest?

159 Comments

  1. BDAABAT
    Posted Aug 8, 2008 at 8:19 AM | Permalink

    If the code shows that he tested a variety of cases, many of which didn’t meet the expected criterion, doesn’t that demonstrate intent?

    Bruce

    Steve: Let’s give the intent a rest for a while. We can all speculate on why. But without access to records, you can never really know for sure. I realize that I was pretty mad about the contents of this SI, but, in practical terms, we’ll never know Ammann’s intent without access to his records, which we’re not going to get. So let’s discuss what’s on the record.

    At a certain point, it doesn’t matter anyway – if the analysis is comical, who cares what his intent was? Did he intend to appear in public with a bulbous red nose and funny shoes or is that what he wears around every day? I don’t care any more. Let’s just discuss the bulbous red nose and funny shoes, which are perhaps inappropriate costumes to wear to a PR Challenge.

  2. M. Jeff
    Posted Aug 8, 2008 at 8:47 AM | Permalink

    Please try to show more tolerance for the mathematically challenged and those who have to exaggerate their sharpshooting skills. In my rural Texas high school learning to dehorn and castrate was a major part of the curriculum as compared to less important subjects such as math. Another deficiency was that I had to wait until college to have formal training in sharpshooting. Perhaps Ammann was similarly deprived in his childhood?

  3. Barney Frank
    Posted Aug 8, 2008 at 8:54 AM | Permalink

    Steve,

    I can follow the basics but the advanced math is, and forever will be, over my head.
    What is not over my head however is something seldom commented on; you are a wonderful linguistic stylist. There was a great career waiting for you in some area of writing, had you not gone the route you chose.

  4. Fred
    Posted Aug 8, 2008 at 8:59 AM | Permalink

    During my Grad school adventure, we didn’t refer to Statistics as “Sadistics” for nothing.

    Lies, damn lies and Stats.

    Go Team go.

  5. TerryBixler
    Posted Aug 8, 2008 at 9:11 AM | Permalink

    How to repair the damage that the funny nose has done with no pea under any shell with 99.99 certainty . snip – policy Thank you for your work.

  6. TerryBixler
    Posted Aug 8, 2008 at 10:21 AM | Permalink

    Steve
    Thank you for edit, sometimes the bigger picture invades my smaller brain. Thank you again for the fundamental work. It is shocking to see the level of selective detail that Ammann coded to get his results. With code that I am responsible for I require commentary that highlights the intention of a section. If there is no commentary much more review is required. Sometimes I have spent more than a year digging at a hex dump to reveal the problem. Typically money is involved, what makes your work so important is that it is possibly future money.

  7. Bill F
    Posted Aug 8, 2008 at 10:56 AM | Permalink

    Somebody told me recently that statistics is alot like a string bikini on a beautiful woman. What is revealed can be very interesting to look at…but what is concealed is often far more fascinating. I think Steve just gave Caspar a wardrobe malfunction…nice work Steve.

  8. Michael Jankowski
    Posted Aug 8, 2008 at 10:59 AM | Permalink

    Yet another “novel” statistical method that isn’t tested before applied.

    Can any stats whiz pull-up some cal RE/ver RE ratios in other publications so we can see how bunk it is?

    Steve: Nope. It’s never been used anywhere. It was specially concocted for this particular Texas sharpshooting contest. But what a pathetic performance by Climatic Change. They knew that this whole thing of RE significance was a battleground issue. Wahl and Ammann had been held up for years because the rejection of previous efforts to circumvent this problem. You’d think that someone would have asked him – where the hell did this criterion come from? Show me a reference. What does it do? But these folks are so consumed by the desire to vindicate themselves that they don’t notice that they’re wearing bulbous red noses.

  9. Steve McIntyre
    Posted Aug 8, 2008 at 11:04 AM | Permalink

    There are a few lessons here. With the data and code in hand, what did it take to figure this out? A couple of days?

    I put my code out there so that people could refute the results if they deserved refuting. And in fairness to Ammann, while it took 3 years for him to put this data and code online, and while he prudently waited until AR34 was safely out of the way, he did put the data and code online, so that I’m in a position to make definitive statements about this without getting into Mannian arguments about whether we made a wrong turn on the road to Podunk, when the map was wrong in the first place.

    If people want to improve actual knowledge and actually resolve things in a definitive way, this is the way to go. So good for Ammann in finally complying. Although, since he raised money from NOAA for the PR Challenge promising open source, he was in a bit of an awkward spot in continuing to withhold his SI.

    The problem with what appears to be a total evisceration of poor Caspar is that this is going to provide very little encouragement to Briffa or Esper or someone like that to show what they did. Their conclusion is not that this exercise is an excellent example of open source at work, but that Ammann was a damn fool for ever showing his data and methods.

  10. Patrick M.
    Posted Aug 8, 2008 at 11:13 AM | Permalink

    Am I the only one who’s starting to feel like this is too good to be true? Why would this data be released at all if it shows such incompetance/snip ? Is there any chance that the data newly released is not for real? Were they forced to release this data? If not, I don’t get it.

    Is it possible that the Team planted this data?

    It just boggles my mind that they would release such incriminating data if they didn’t have to.

  11. Luis Dias
    Posted Aug 8, 2008 at 11:25 AM | Permalink

    I have to agree with #3. I laughed to the floor with the satire. Great read, Mr. Steve. And I’m sorry for having to deal with such idiotic papers, instead of auditing stuff more interesting from the technical point of view.

  12. Lee W
    Posted Aug 8, 2008 at 11:28 AM | Permalink

    Steve,

    As a non-scientist, I thoroughly enjoy you’re site. I do have one reservation…

    I understand yhat your primary objective is to get at the numbers to either validate or falsify, and that you do not wish to [snip] However, if you view this in the context of other occupations; say legal or medical, the actions taken by Team members (regardless of intent) would undoubtedly be met by censure, … [snip]

    I say this with a sad heart, but it is truly ashame when the legal community does a better job of self-enforcement than portions of the scientific community. What exactly does this say about the state of science??

    Keep up the good fight!

  13. deadwood
    Posted Aug 8, 2008 at 11:31 AM | Permalink

    Yeah Patrick, what is going on? Is there someone behind the scenes here having a good long laugh?

  14. Richard deSousa
    Posted Aug 8, 2008 at 11:50 AM | Permalink

    Hell, here I thunk my sharpshooting prowess was second to none until Steve found me out… 😉

  15. Steve McIntyre
    Posted Aug 8, 2008 at 12:31 PM | Permalink

    Please – no more angry posts. I ask people over and over not to be angry. I realize that I was angry with Ammann for a couple of days, but I’m back to seeing the humor in all of this. If you post something angry, be prepared for it to be removed.

  16. MrPete
    Posted Aug 8, 2008 at 12:36 PM | Permalink

    Trying to understand this in simple terms that make sense to me. If I’m reading this correctly:

    1) Using all the data does not validate MBH:
    1a) Bullseye (MBH data analysis) is at (REcal,REver) = (.39, .48), but
    1b) The needed 99th percentile value is REver = .52 (or .54 for mm)

    2) Tossing outliers in a typical fashion doesn’t help (because tossing on both ends does not move the bullseye?)

    3) This “unprecedented” ratio method selects samples to be removed in a such a way that “shots” on only one side of the bullseye are removed, thus in effect moving the bullseye.

    3a) To get the 99th percentile value down to MBH range, we need to remove “shots” with higher REver than MBH.
    3b) But we don’t want to SAY that
    3c) So we use this nice table of ratios and see what value we can use that will exclude high-REver “shots” without excluding MBH itself:

    Item REcal REver ratio
    113 0.3957 0.609 0.650
    647 0.3390 0.644 0.526
    944 0.3485 0.620 0.562
    548 0.3016 0.599 0.504
    374 0.3542 0.550 0.644
    683 0.2479 0.542 0.457
    153 0.3826 0.519 0.737
    146 0.3112 0.514 0.605
    299 0.3383 0.508 0.666
    40 0.3176 0.508 0.625
    194 0.1840 0.502 0.367
    492 0.3552 0.491 0.723
    656 0.3284 0.483 0.680
    MBH 0.3916 0.482 0.813

    Examining the table shows that any ratio value between 0.74 and 0.812 will “work”… and 0.75 is a nice “round” number.

    Do I have it right so far?

    Here are my questions:

    * If a table like this were created with ALL the votes, are there other high-RE samples that also have high ratios and thus were not excluded? If so, why is that not important? (My guess: all you need is enough rejections to move the bullseye.)
    * Why doesn’t it matter that low-RE values would also be excluded? If an equal number were excluded on both sides, wouldn’t that leave the bullseye in the same place?

    Probably just dense here 🙂

  17. Michael Jankowski
    Posted Aug 8, 2008 at 12:45 PM | Permalink

    Re#8 (Steve), I apologize that my post wasn’t clear enough. I know the use of these ratios is new territory (hence my jab at “novel”).

    I was asking if there were published studies (in any field) in which there was both a calibration RE and a verification RE calculated, so that posters could take a look at their resulting ratios and see where this magic 0.75 cutoff fits-in with those.

  18. Steve McIntyre
    Posted Aug 8, 2008 at 12:52 PM | Permalink

    Just so everyone’s on the same track, there is no such thing in statistics as “99% significance.” So Ammann is searching for a fake talisman in the first place. But having committed himself to the fake talisman, he is going to great lengths to protect it/

    Yes, there are results that pass that Ammann Texassharpshooting benchmark with REs higher than MBH (and BTW this number concedes Mannian principal components which isn’t really valid either) but not a lot. However, you couldn’t honestly say that a recon with REcal of 0.39 and REver of 0.48 was “99% proven” relative to say one with REcal of 0.536 and REver of 0.441. The important point in all of this is that the methodology throws up a LOT of values with both in high ranges, simply from red noise processes.

    It’s pretty cheeky to have this data in your SI and then say in your IPCC-approved article that an REver of 0.0 is “99% significant”.

    REcal REver Ratio
    657 0.459 0.577 0.795
    505 0.417 0.542 0.769
    221 0.533 0.539 0.987
    440 0.452 0.521 0.868
    60 0.492 0.520 0.945
    854 0.484 0.504 0.960
    752 0.440 0.499 0.881
    1 0.452 0.499 0.908
    549 0.629 0.495 1.271
    795 0.583 0.495 1.179
    357 0.472 0.484 0.977

    815 0.391 0.475 0.825
    486 0.421 0.473 0.889
    862 0.417 0.470 0.887
    527 0.414 0.464 0.892
    159 0.426 0.462 0.923
    58 0.434 0.458 0.948
    318 0.370 0.457 0.810
    878 0.472 0.452 1.044
    544 0.494 0.452 1.093
    649 0.465 0.443 1.051
    971 0.536 0.441 1.213
    148 0.413 0.440 0.938
    50 0.464 0.439 1.056
    232 0.444 0.436 1.017
    966 0.332 0.433 0.765
    733 0.510 0.431 1.182
    764 0.454 0.431 1.054
    763 0.329 0.428 0.770
    613 0.414 0.427 0.970
    403 0.357 0.427 0.837
    561 0.404 0.427 0.946
    4 0.348 0.426 0.816
    696 0.423 0.423 1.000
    686 0.502 0.422 1.191
    278 0.348 0.418 0.834
    699 0.490 0.414 1.183
    138 0.436 0.413 1.058
    701 0.504 0.412 1.224
    271 0.386 0.411 0.938
    233 0.401 0.408 0.982
    599 0.504 0.406 1.241
    127 0.427 0.405 1.054
    7 0.345 0.404 0.853
    393 0.458 0.403 1.137
    350 0.409 0.401 1.019
    460 0.375 0.400 0.937
    340 0.338 0.395 0.857
    398 0.538 0.394 1.365
    251 0.382 0.393 0.970
    792 0.445 0.393 1.134
    59 0.378 0.391 0.966
    11 0.453 0.388 1.168
    260 0.490 0.384 1.276
    75 0.391 0.383 1.021
    179 0.387 0.381 1.017
    707 0.363 0.381 0.952
    322 0.455 0.378 1.204
    425 0.456 0.374 1.220
    69 0.434 0.373 1.162
    20 0.508 0.370 1.371
    351 0.436 0.370 1.179
    685 0.407 0.367 1.110
    478 0.288 0.367 0.786
    210 0.469 0.366 1.283
    64 0.466 0.366 1.274
    916 0.381 0.364 1.045
    80 0.436 0.363 1.200
    936 0.414 0.363 1.140
    383 0.474 0.363 1.305
    261 0.506 0.362 1.398
    270 0.273 0.362 0.755
    617 0.357 0.361 0.988
    577 0.485 0.361 1.344
    450 0.311 0.361 0.861
    438 0.336 0.359 0.937
    510 0.423 0.356 1.189
    227 0.403 0.355 1.133
    408 0.289 0.355 0.814
    130 0.267 0.355 0.753
    188 0.410 0.354 1.158
    354 0.291 0.353 0.822
    39 0.318 0.346 0.918
    228 0.505 0.345 1.464
    391 0.338 0.345 0.981
    215 0.396 0.345 1.150
    857 0.274 0.344 0.796
    458 0.322 0.343 0.940
    198 0.361 0.343 1.052
    982 0.349 0.340 1.027
    319 0.432 0.338 1.277
    709 0.279 0.338 0.824
    895 0.392 0.337 1.162
    443 0.325 0.337 0.964
    951 0.509 0.336 1.517
    576 0.523 0.336 1.558
    861 0.367 0.334 1.099
    504 0.520 0.333 1.561
    572 0.310 0.333 0.933
    29 0.283 0.332 0.852
    129 0.251 0.330 0.759
    173 0.338 0.330 1.023
    91 0.286 0.330 0.866
    807 0.543 0.330 1.646
    94 0.305 0.330 0.926
    448 0.461 0.330 1.399
    506 0.329 0.329 1.001
    281 0.512 0.328 1.559
    293 0.317 0.327 0.972
    364 0.410 0.324 1.264
    483 0.289 0.323 0.895
    336 0.277 0.322 0.859
    829 0.375 0.322 1.165
    780 0.339 0.322 1.053
    239 0.286 0.320 0.895
    636 0.428 0.319 1.341
    217 0.364 0.318 1.143
    349 0.474 0.317 1.494
    727 0.327 0.315 1.038
    212 0.325 0.314 1.034
    520 0.343 0.313 1.095
    625 0.355 0.311 1.142
    143 0.266 0.311 0.858
    107 0.457 0.310 1.475
    841 0.563 0.310 1.817
    677 0.273 0.308 0.886
    133 0.420 0.305 1.374
    102 0.318 0.304 1.048
    131 0.374 0.303 1.236
    446 0.443 0.303 1.464
    53 0.405 0.303 1.339
    571 0.272 0.303 0.900
    642 0.317 0.302 1.048
    688 0.421 0.302 1.393
    974 0.544 0.302 1.801
    213 0.351 0.301 1.163
    74 0.375 0.301 1.244
    830 0.419 0.301 1.391
    431 0.352 0.301 1.169

  19. Phil B.
    Posted Aug 8, 2008 at 1:09 PM | Permalink

    What is the justification for having a verification period RE be greater than the calibration period RE when you know the verification period temperature data is of poorer quality?? Should have been a red flag, at least with the RE stat. Obviously, Mann or Ammann didn’t consider this fact.

    Steve
    : In a standard text, one ironically cited by Mann in his Nature reply, (Wilks), they show that in a stationary process their Skill Score (RE) is necessarily less than the verification r2 (a result previously noted by Murphy 1988, cited in our GRL article). There are all sorts of warning flags all over the place, this is just one of many.

  20. DJ
    Posted Aug 8, 2008 at 1:21 PM | Permalink

    Steve, I read your Site every day. I love you being tedious in Your Work. I too am tedious being a HVAC Service Tech for many years. People do not appreciate this at times because I’m too slow and the Bill is getting bigger. But My “Repeats” are less than 2%. That is the way I look at it. The bottom Line. The Challenge is not letting People affect your Attude and Decision Making. For if they do, it ALWAYS comes back to bite you where you don’t want it to.

    We need more People like you. Truth is all WE are asking for, Right? TRUTH………

  21. Pat Keating
    Posted Aug 8, 2008 at 1:30 PM | Permalink

    20 DJ

    You can’t handle the truth!

  22. Ross Berteig
    Posted Aug 8, 2008 at 2:28 PM | Permalink

    I’m slightly confused here, and I suspect I’m not alone. Could someone provide a layman’s definition of RE that says what it is trying to measure, and what its range of scores is, is it unit-less, etc.?

    I get that it is a measure of skill, and that calculating it for calibration cases vs. verification cases should give distinct values. It also is clear from context that larger values are “better” and that it can be negative.

    But since my experience (always risky when thinking about statistics which so often has provable results that seem counter to intuition) tells me that one should be really excited to have a model that shows more skill in verification than over the data against which it was calibrated, I have to wonder why discard those cases at all?

    Maybe I don’t understand “skill”, “calibration” or “verification” either… 😉

  23. Steve McIntyre
    Posted Aug 8, 2008 at 2:45 PM | Permalink

    #22. There are a lot of posts here which start in the middle of the conversation. I’m afraid it’s the nature of this particular blog. I try to write clearly but it’s hard to recite the history of each of this issues in a self-contained manner in every post. Look at the Wahl and Ammann category and work through it if you want a history.

    As to being confused about the properties of the RE statistics, that would qualify you to be a climate scientist. There is no theoretical distribution.

  24. MrPete
    Posted Aug 8, 2008 at 2:48 PM | Permalink

    Interesting. Sort by ratio and you find:

    Out of all of those “high ratio” values, only two remain that have higher RE than MBH but lower ratio (and thus their elimination pushes the bullseye in the “right” direction:

    Id REcal REvar Ratio
    657 0.459 0.577 0.795
    505 0.417 0.542 0.769

    But you can’t use them, because to trash them you also trash eight with lower RE… thus pushing the bullseye in the “wrong” direction. (Update: or, maybe they aren’t needed because the bullseye is already moved enough?)

    Not only that, but the next element with higher ratio than 0.75 is this one:

    130 0.267 0.355 0.753

    …which may also push the bullseye in the wrong direction. (Update: Whether or not that’s true, it seems clear those who set this up didn’t understand what they were doing any more than I understand this… as a non-stats guy, I would pick a value that accomplishes what I’m looking for and avoids eliminating more “votes” than needed.)

    So the “best” ratio is in a very narrow range. Larger than 0.737 (see #16 above) and smaller than 0.753 — that’s how to maximize the bullseye “push”

    So ’twas a very convenient selection of the “conservative” ratio as 0.75

    PLEASE correct me if I’m confused. I may not be understanding this at all correctly!

    [For example: are there more “wrong direction” values between 0.737 and 0.75? If so, then they gave up a little “significance” to get that nice round 0.75 number.]

  25. Steve McIntyre
    Posted Aug 8, 2008 at 3:08 PM | Permalink

    As Esper et al 2003 said, without any referee or reader batting an eye:

    this does not mean that one could not improve a chronology by reducing the number of series used if the purpose of removing samples is to enhance a desired signal. The ability to pick and choose which samples to use is an advantage unique to dendroclimatology.

  26. Barclay E. MacDonald
    Posted Aug 8, 2008 at 3:23 PM | Permalink

    #10 “Is there any chance that the data newly released is not for real?”

    If you go back to the Dragging Cat thread and go down to this statement,

    “Just to prove that this matches their results, here is code that reads their output summary, showing that they got exactly the above results. You can inspect these results at the link above.”…

    and continue to follow the argument on down the thread, you may be able to satisfactorily answer your excellent question.

  27. bender
    Posted Aug 8, 2008 at 3:37 PM | Permalink

    #25
    So the cherry-picking advantage is NOT unique to dendroclimatology; Texas sharpshooters enjoy the same advantage. (Is Texas sharpshooting an Olympic event? Maybe the Team submitted a team?)

  28. Darren
    Posted Aug 8, 2008 at 3:41 PM | Permalink

    The ability to pick and choose which samples to use is an advantage unique to dendroclimatology

    Wow… Just. Wow.

  29. Mark T.
    Posted Aug 8, 2008 at 4:01 PM | Permalink

    Whether you find merit in the extreme AGW hypothesis or not, statements like Esper’s, as well as the mess discussed in this thread, should always result in statements like Darren’s.

    Mark

  30. Luis Dias
    Posted Aug 8, 2008 at 4:04 PM | Permalink

    It’s an amazing quote, I admit! One should make a list of “Famous HS Team Punchlines” and make a post with them. It would be a memorable post.

  31. bender
    Posted Aug 8, 2008 at 4:15 PM | Permalink

    It’s an old, old quote. Read the blog. It’s not the only one of this type.

  32. Gary Luke
    Posted Aug 8, 2008 at 4:30 PM | Permalink

    They called it “conservative” because it restrained the significance to below 100%.

  33. Craig Loehle
    Posted Aug 8, 2008 at 4:31 PM | Permalink

    This use of 99.99% significant is like the practice of in a regression problem finding a R2 of .1 but it passes an F test at .01 so you say it is 99% significant but it explains hardly anything and in fact you can easily get R2 of .33 from random data (depending on sample size), as has been known for 30 years+.

  34. Ross McKitrick
    Posted Aug 8, 2008 at 4:34 PM | Permalink

    One of the many weird things about Ammann’s novel tests is the use of a ratio of statistics (RE / RE), both of which include zero in their support. Try to picture the space of this ratio. RE is not well-defined to begin with, but put it into this ratio and it becomes a nonstationary monster.

    It’s a bit like taking two gaussian normals variables, x~N(0,1) and y~N(0,1). Each one is a nice, well-behaved stationary variable. Now try plotting z=x/y. It’s not normal, it’s Cauchy, and it’s not stationary. It doesn’t even have a finite mean. When dealing with statistics that have zero in their domain, you just can’t form ratios without introducing significant new complications. This is the basis for Gleser&Hwang’s theorem on the non-existence of finite confidence intervals in errors-in-variables models.

    Which reminds me — way off topic I realize, but errors-in-variables is the proper term for the technique called “total least squares” used in signal detection regressions by the IPCC team.

  35. jcspe
    Posted Aug 8, 2008 at 4:40 PM | Permalink

    bender,

    Whether or not it is an old quote, Luis is right. A “best of” list of team quotes would be a howler worth reading.

  36. bender
    Posted Aug 8, 2008 at 4:41 PM | Permalink

    #33 Exactly. A high significance level (low p value) is important for retaining a hypothesis; that doesn’t mean the hypothesis is all that powerful. For example, in a properly controlled experiment an R2 of 0.05 with a significance level of 0.01 means the effect is very weak, but defintiely not negligible. In an uncontrolled natural “experiment”, the effect may not even exist, may be attributable to other factors not studied.

  37. Luis Dias
    Posted Aug 8, 2008 at 4:52 PM | Permalink

    #31

    Yeah, good advice Bender. I guess I’m just going to my bedroom read all CA’s posts. See ya next spring then. Will someone bring me food and water? Thanks in advance!

  38. Not sure
    Posted Aug 8, 2008 at 4:55 PM | Permalink

    Jacoby and D’Arrigo are the most quotable IMHO:

    http://www.climateaudit.org/?p=570
    http://www.climateaudit.org/?p=29

  39. bender
    Posted Aug 8, 2008 at 5:06 PM | Permalink

    The method in the madness is as follows.

    It makes sense to hypothesize that there are certain sites that are just right, climatically, to put them in the “sweet spot” of the tree’s response to growing season extremes. The problem is demarcating, in advance, which sites give you sweet vs. sour responses. There is no a priori theory for drawing that line, so they do it a posteriori, after the fact. Which is the exact definition of “texas sharpshooting”.

    And they will continue to do this until there is a fully quantitative physiological theory of tree response to weather variations.

    They are not the only sharpshooters, however. Demarcating atmos-ocean circulatory modes (based on a posteriori EOF analysis) and presuming them to be persistent is another kind of texas sharpshooting.

  40. Ross Berteig
    Posted Aug 8, 2008 at 5:20 PM | Permalink

    #23, Steve, I’ve been following along enough to have a sense of the history. I’m just trying to get a clear mental model for what this thing called RE is measuring, and what its range of values is. As Ross points out in #34, it seems disquieting that they are taking a ratio of values that in principal can be zero, and treating that as meaningful.

    I’m sitting here imagining an analogous (I think, at least) process with a class-room full of student’s grades. Would anyone seriously claim that the ratio of homework score to test score (or vice-versa) is meaningful even if it is well-defined?

    I think the Texas Sharpshooter claim is right on.

  41. Steve McIntyre
    Posted Aug 8, 2008 at 5:50 PM | Permalink

    Y’know, this is really a perfect image ofr so much of this dreck – I should have been using it a long time ago. At the NAS panel presentations, Mann was asked about the divergence problem. His answer – draw a bulls’ eye around the Yamal series and say: look, ma, no divergence problem.

  42. bender
    Posted Aug 8, 2008 at 6:03 PM | Permalink

    #41
    A search of CA will reveal several instances where I’ve previously drawn attention to the TSF in climate science. It is a powerful metaphor. And it is an accurate metaphor.

  43. Phil B.
    Posted Aug 8, 2008 at 6:41 PM | Permalink

    #34

    Ross and others, there is a SIAM book out by Van Huffel and Vandewalle titled “The Total Least Squares Problem”, which addresses the relationship of TLS and EIV starting on page 228 plus a lot of other info on TLS. IMO, a useful book. Not sure if you were familiar with this book.

  44. Steve McIntyre
    Posted Aug 8, 2008 at 7:21 PM | Permalink

    #34, 43. Ross, I’m with Phil B on this. The same method crops up in different contexts under different names. I don’t think that the econometric label “errors-in-variables” necessarily trumps other usages. However, as you’ve observed, the fact that Hegerl et al cited an 1886 publication as authority was not particularly reassuring on their familiarity with the relevant literature. The method of solving the problem, interestingly, turns out to be a principal component analysis (svd) in which the solution is the least eigenvalue.

  45. Phil B.
    Posted Aug 8, 2008 at 7:45 PM | Permalink

    #44, I’ve become a big fan of svd, it is an incredible linear algebra tool for both practical and analytical results.

  46. Posted Aug 8, 2008 at 8:05 PM | Permalink

    Uh Oh, RC site is down. I wonder if they are going to respond to your latest finding.

  47. Fred
    Posted Aug 8, 2008 at 9:20 PM | Permalink

    wonder how long it will take before the Team’s Statistics efforts become teaching points in University courses . . . . lesson’s in how not to do Stats will be their legacy.

  48. Luis Dias
    Posted Aug 8, 2008 at 10:04 PM | Permalink

    I’ve found this interesting debate with skeptics, among who Roy Spencer, and IPCC people about the Swindle program, in ABC (Australian Broadcast Corporation). Apart from the blatant bias against any questioning and skepticism against the IPCC position (that moderator was clearly on “mission mode”), it’s interesting on how in one instance, the Hockey Stick is discussed, and further on, Roy Spencer blatantly accuses it to be a fraud. The opponent then refers to a study made “two years ago” demonstrating HS to be not falsified at all, and the exchange ends there.

    Quite interesting. One wonders if this debate was tomorrow what would that exchange be like again!

    It’s on youtube, easy to find. Go see it.

  49. Luis Dias
    Posted Aug 8, 2008 at 10:14 PM | Permalink

    I’m sorry, it wasn’t Roy Spencer (I heard bad), it was Ray Evans.

  50. jnicklin
    Posted Aug 9, 2008 at 12:18 AM | Permalink

    This “conservative benchmark” seems to be very similar to what we called Cook’s Variable Constant when I studied botany all those years ago.

  51. Ross McKitrick
    Posted Aug 9, 2008 at 12:38 AM | Permalink

    44: Steve, in an econometrics text (not that I’ve read up much on this) the EIV solution is not simply to pick a rotation direction. You have to deal with the endogeneity problem or your coefficients will be inconsistent. The solution requires using instrumental variables to form a strictly exogenous estimator for the rhs variables. That’s as far as I got in my reading, which takes it up to the 1980s or so, which isn’t very recent, but at least is post-1886.

  52. chopbox
    Posted Aug 9, 2008 at 1:07 AM | Permalink

    #49 (Luis)
    Couldn’t find the YouTube video you were talking about. Any help?

  53. Syl
    Posted Aug 9, 2008 at 2:35 AM | Permalink

    “The ability to pick and choose which samples to use is an advantage unique to dendroclimatology.”

    Yeah, ’cause cherries grow on trees doncha know. That’s why it’s not called tomato picking!

  54. Luis Dias
    Posted Aug 9, 2008 at 3:51 AM | Permalink

    #53

    Go Here.

  55. Posted Aug 9, 2008 at 7:11 AM | Permalink

    I found the quotation I was looking for. In the Wegman Report, page 14, Dr Wegman says (my emphasis):

    A cardinal rule of statistical inference is that the method of analysis must be decided before looking at the data. The rules and strategy of analysis cannot be changed in order to obtain the desired result. Such a strategy carries no statistical integrity and cannot be used as a basis for drawing sound inferential conclusions.

    Now what Ammann did completely violated this rule. He calculated what the result should be in order to let Mann off the hook, and called it a “conservative ratio”. That is like peeking at the answers and calling it a fair examination.

    “No statistical integrity” would appear to an accurate description.

  56. MrPete
    Posted Aug 9, 2008 at 8:40 AM | Permalink

    I suppose Wegman’s assertion also applies to

    The ability to pick and choose which samples to use is an advantage unique to dendroclimatology

    How inconvenient.

  57. bender
    Posted Aug 9, 2008 at 8:41 AM | Permalink

    #56 Tempering this critical perspective somewhat, it is important to underline that dendroclimatology is an immature, emerging science, where hypothesis generation (not testing) is a central activity. The discipline will not mature until it embraces experimental ecophysiological approaches to calibrating tree responses to T, P, soil, light, etc. Until that time it will continue to operate in hypothesis generation mode, where data are interpreted a posteriori. This is legitimate science. It is just not a sound basis for trillion-dollar global environmental-energy policy.

    You have to walk before you can run and dendroclimatology is still at the crawling stage.

  58. Ron Cram
    Posted Aug 9, 2008 at 9:10 AM | Permalink

    re: 58
    bender,

    I am compassionate to immature science… up to a point. I decided to look into your claim that dendroclimatology was an emerging science. Here are some of the references I found:

    Articles
    * Douglass, A.E. 1920. Evidence of climatic effects in the annual rings of trees. Ecology 1(1): 24-32.

    * Wilson, A.T., Grinsted, M.J. 1927. The possibilities of deriving past climate information from stable isotope studies on tree rings. Bulletin

    * Schulman, E. 1938. Nineteen centuries of rainfall history in the southwest. Bulletin of the American Meteorological Society 19(5): 211-216.

    * Fritts, H.C. 1971. Dendroclimatology and dendroecology. Quaternary Research 1: 419-449.

    Books
    Tree Rings and Climate by Harold C. Fritts, 1976 Academic Press, New York, NY. 567 pp.

    Some of these authors are probably dead by now. It appears to me the science has been around long enough to prove they have something or not. I would call dendroclimatology an immature pseudoscience.

  59. fred
    Posted Aug 9, 2008 at 12:49 PM | Permalink

    People have asked for RE, R2 and CE step by step explanations.

    There is a ten or 15 page summary starting here

    http://books.nap.edu/openbook.php?record_id=11676&page=83

    Hope this is helpful.

  60. tty
    Posted Aug 9, 2008 at 12:52 PM | Permalink

    Re 39

    Provided there is such a sweet spot (which is unproven), there is a slight problem with your suggested methodology. As soon as there is an appreciable climate change your selected tree will no longer be in that sweet spot. So what you are saying is in effect that dendroclimatology can determine climate change provided there isn’t any.

  61. Steve McIntyre
    Posted Aug 9, 2008 at 1:11 PM | Permalink

    bender is well able to speak for himself. But there’s no need to paraphrase his remarks. Yes, climate change will alter the circumstances of an individual tree – something that traditional dendroclimatology ignores; and I’m sure that bender agrees with his point. But I don’t understand him as even being committed to the idea that dendros have the tools to measure temperature in the absence of climate change, simply because all kinds of things are going on with a tree as well as temperature. There are layers and layers of problems – I realize that we may well be in agreement on this and speaking at cross-purposes.

  62. bender
    Posted Aug 9, 2008 at 2:31 PM | Permalink

    #58 That’s a silly argument. I could cite Arhennius. Would that mean the science of GHGs was settled more than a century ago? How many papers would you like me to cite to indicate that the statistical and ecophysiological dendroclimatology is stilll evolving? 1000?

    #61 Correct, and many a dendroclimatologist (and every ecophysiologist!) understands this – that the tree never sits in the sweet spot as long as climate is shifting. It means dynamic approaches need to be taken to estimate sensitivity. It’s just that this is not a simple statistical problem. If it were, it would be solved by now. Again, addressing #58.

    #62 Yes on all counts. And it bothers me when folks argue at cross-purposes. My point was simply a counterpoint to John A’s dismissive over-generalization. – snip – They require context. Climate recons have “statistical integrity”. The question is how robust they are and whether that degree of robustness is sufficient to serve as a driver of global policy. That’s where skepticism is legitimate.

    snip . When it is on the topic of dendroclimatology I will call him on it.

    Steve – No need to backbite on other posters after you’ve made your point.

  63. Allen
    Posted Aug 9, 2008 at 2:50 PM | Permalink

    fred #60, Thanks, I’ve downloaded the entire book — the title seems relevant. In the mean time, I have refreshed my knowledge on the basics of R2 and found a description of RE as used by climatologists. Tentatively, those look pretty weak regarding dendrochronology applications.

    Also, searching based on Ron #59, found some dendrochronology introductions at Amazon — I ordered one.

    I agree with Steve M’s #62, it seems that many-many things could impact tree ring data besides temperature. So, I will be interested to dig deeper and see if dendrochronology genuinely accounts for those extra possible independent variables in an objective way. Regardless how it turns out, dendrochronology looks like a fun topic to study.

  64. bender
    Posted Aug 9, 2008 at 3:08 PM | Permalink

    #63 asks:

    if dendrochronology genuinely accounts for those extra possible independent variables in an objective way

    What “extra possible independent variables”? “Account” how? The answer can be yes or now depending which studies, species, site conditions, etc. you include in your analysis.

    The whole idea that Esper and Jacoby and D’Arrigo were tryng to relay is that if you choose the right trees on the right sites you will get a strong, near-linear, univariate response. This is true. If and only if you know what “right” means. And if you can do this a priori then there is no need to include “extra independent variables”, as their contribution is so small that ignoring them does not substantially reduce variance explained. Including non-significant predictors needlessly erodes your ability to estimate with precision the effects of the known drivers.

    Not to discourage you from researching the topic. Just to say: the scientific community already knows the answer to your question: choose sites where the influence of these other drivers is known (from theory, from facts) to be negligible.

  65. Dodgy Geezer
    Posted Aug 9, 2008 at 3:11 PM | Permalink

    I think some thought should be given to the future. Presumably, at some point when the climate goes back to 1970 temperatures, everyone will realise the errors that have been made and come looking for a scapegoat.

    As it stands, the IPCC scientists will be able to say “We put our science out to peer review and had no dissent. Everyone agreed.”. They will be able to say that because Climate Audit is not a recognised scientific journal. It therefore seems important to me that each of these findings is written up and submitted to such a journal.

    I imagine that this would involve a lot of work, and, given that the journals would probably behave like Nature and reject the article on spurious grounds, the work may be nugatory. But at least the effort should be made, and the rejections documented, so that the auditors of the future may have something to get their teeth into…

  66. Ron Cram
    Posted Aug 9, 2008 at 3:23 PM | Permalink

    re: 63
    bender,

    It is not a silly argument at all. The dendros have had plenty of time to realize it is impossible to separate the confounding factors of temp, precip, fertilization, etc. when looking at the width of a tree ring. If the science was only 15 or 20 years old, one might be a little more understanding and compassionate – although I would not be among them.

    The GHG science has a little more going for it in that we have had rising temps for part of the 20th century – not for all, but enough that people tend to overlook the divergence from 1945 to 1975. Of course, it is getting hard to ignore the divergence now. We have had lots more CO2 go into the air since 1998, but 1998 is still the warmest year on record.

    If you want to talk about immature science that is still developing, I would point you to scientific forecasting. One of the leaders is J. Scott Armstrong. See http://en.wikipedia.org/wiki/J_Scott_Armstrong It may sound a little like scientific crystal ball reading, but they have been publishing their own specialized journal for about 25 years now. And now they are up to four different journals. I think this is a branch of science that will bear significant fruit in the years ahead – unlike dendroclimatology.

  67. Ron Cram
    Posted Aug 9, 2008 at 3:41 PM | Permalink

    re: 65
    bender,

    Your phrase “the influence of these other drivers is known (from theory, from facts) to be negligible” is not supportable. Much of what is “known” in dendroclimatology is simply untrue. If dendroclimatology is ever able to reassert its credibility, it will have to begin by cleaning its own house and disavowing the flawed works of Briffa, Ammann, Wahl and others. Don’t hold your breath.

  68. Posted Aug 9, 2008 at 3:49 PM | Permalink

    58 Ron Cram,

    Immature may mean unsettled or a lack of consensus on the science. Here are a couple of links to a tree expert that doesn’t have a dog in the hunt.

    Effects of Ozone and Climate on Ponderosa Pine

    I haven’t read the whole article on this one but the abstract is interesting.

    Variable Selection in Dendroclimatology:

    (hope the links took)

  69. Posted Aug 9, 2008 at 3:57 PM | Permalink

    Sorry I left out an interesting quote I wanted to include in the first link:

    Palmer hydrological drought index is highly correlated (positively) with growth during the summer months-. total precipitation in spring is positively correlated with growth, and mean temperature in spring is negatively correlated with growth.

    The second link deals with statistic problems encountered by dendro’s except of course paleoclimatologists.

  70. Jonathan Schafer
    Posted Aug 9, 2008 at 4:03 PM | Permalink

    #63,

    You say

    Climate recons have “statistical integrity”.

    From Statcom_08

    Integrity. There is also a double-edge to the goal of statistical integrity. We know that
    there is no integrity without quality, and that quality is expensive.

    Given the lack of data quality as has been presented over and over on this site as it relates to climate reconstructions, I would disagree with your assessment that they have “statistical integrity”.

    Another reference from a different publication states

    Integrity means “soundness”, which naturally implies validity and reliability. So “statistical integrity” refers to the soundness of statistical methodology, including experimental design, data gathering and analysis

    Again, given the “unsoundness” of some of the statistical methodologies applied by various climate scientists when dealing with proxy based reconstructions, I again would have to disagree with your assemssment that they have statistical integrity.

    Perhaps I am misunderstanding exactly where the statistical integrity you refer to originates from. Certainly not from the data, nor from at least some of the methodology(s).

  71. Jonathan Schafer
    Posted Aug 9, 2008 at 4:07 PM | Permalink

    And a follow-up…

    National Statistical Service

    Perhaps those from the CSIRO should have taken some of these to heart.

  72. Posted Aug 9, 2008 at 4:25 PM | Permalink

    Bender:

    My point was simply a counterpoint to John A’s dismissive over-generalization….They require context. Climate recons have “statistical integrity”. The question is how robust they are and whether that degree of robustness is sufficient to serve as a driver of global policy.

    I’m baffled as to what counterpoint you were making. I quoted Wegman on the statistical integrity of Mannian analysis as it pertains to a posteriori selection of statistical methodology to recover what they already believe to be in those tree ring records.

    I make no claim specifically as to whether dendroclimatology as a whole is fatally flawed since it cannot separate climate variables (although others may be more certain than I), but I do know that that particular field has yet to deal with the statistical nuances that Steve has been talking about in relation to autocorrelation and the peculiar behavior of time series thereof.

    I simply refer to the Hockey Team’s abuse of statistical methods and especially benchmarking which have no integrity.

    I do not see people in the field of climatology stepping up to the plate and swinging at these transparently invalid methods, although a lot has changed in climate science since the blog started, so who knows what will happen.

  73. bender
    Posted Aug 9, 2008 at 4:42 PM | Permalink

    I simply refer to the Hockey Team’s abuse of statistical methods and especially benchmarking which have no integrity.

    That’s fair.

    I do not see people in the field of climatology stepping up to the plate and swinging at these transparently invalid methods, although a lot has changed in climate science since the blog started, so who knows what will happen.

    Bad methods tend to die a quiet death of underuse. If you’re looking for celebratory fireworks as a sign of revolutionary progress, you’re looking for the wrong signs. The fact that only Team members use and choose to defend these methods tells you something.

    I keep telling you – and you keep ignoring the fact – that these problems are a serious challenge. They’re not a joke. You and Steve and I can point to the problems, such as autocorrelation, all we want. For some of these there are as yet no known solutions. snip

    Some good things *could* grow out of the “PR” challenge – if the proponents let it happen. Steve may be shunned from that group, but his arguments are not being ignored.

    Steve: bender, surely there are legitimate causes of complaint. Texas sharpshooting (and I acknowledge that you’ve used this phrase for some time) is pretty deeply ingrained among these folks in a variety of ways and that sort of stuff has nothing to do with legitimate statistical conundrums.

  74. Allen
    Posted Aug 9, 2008 at 4:56 PM | Permalink

    bender #65,

    …Not to discourage you from researching the topic. Just to say: the scientific community already knows the answer to your question…

    Thanks for the response. Actually, I presume “the answer is out there” — and I will find it if I look.

    FWIW, I’m: a scientific newcomer to the AGW field, political independent, moderate environmentalist. What I have found (so far) is that both sides of the AGW debates have plausible arguments for their positions (at least on the surface). However, the pro-AGW-mitigation camp is asking us to spend trillions — thus, the burden of proof resides with them (the philosophy of Climate Audit, I believe). It sure would be great if I could simply read a book (or IPCC report), and get the facts in that one place. Unfortunately, ideology seems to color many presentations. Hence, I feel I must determine the black and white facts for myself by digging beyond the surface of the highly publicized sources.

    FWIW, among the Climate Change & AGW topics I am studying in parallel, I am looking into dendroclimatology and dendrochronology. In the end, we may agree regarding dendroclimatology and its application to AGW. Right now, since the burden of proof lies with the pro-AGW-is-bad camp, I view that camp’s dendroclimatiological arguments with more than normal scientific “skepticism”. On the other hand, I view the anti-AGW-is-bad camp’s and “auditors'” arguments with normal scientific skepticism.

    Steve: I have said over and over that, if I were a policy maker, I would defer to the advice of the academies etc. I don’t suggest that policy makers do nothing. However, I don’t want people to discuss policy here or this sort of issue.

  75. Craig Loehle
    Posted Aug 9, 2008 at 5:01 PM | Permalink

    The basic problem in dendroclimatology is that various assumptions are made but can rarely be tested. It is assumed that response is linear, that conditions around the tree were constant, that the same limiting factors existed etc etc. But you only have a short instrumental period (the calibration period). As soon as you go back beyond that, the validity of the conclusions is entirely dependent on the validity of the (untested) assumptions. In the few cases where data is available, it is rainfall (e.g. outflow of the columbia river over 400 yrs is predicted by tree rings) not temperature that seems to be valid. When the studies were used to generate hypotheses, no problem, but now it is asserted that the temperatures 1000 yrs ago are validly predicted by these trees, with no testing of assumptions.

  76. Geoff
    Posted Aug 9, 2008 at 9:08 PM | Permalink

    Just a few more general comments before we get back to statistical dreck, TSF, and process manipulation.

    1. It seems reasonable to think of dendroclimatology as immature in terms of knowing what it’s about, rather than just chronological age. A good example is this week’s press release from AGU on a new study:

    Human-induced climate change is projected to cause drier conditions in the midlatitudes. To assess whether the onset of drier conditions has already begun, Touchan et al. study newly developed multicentury tree ring records from Tunisia and Algeria for a longer-term perspective on drought in northwestern Africa. Using a new set of 13 chronologies from Atlas cedars (Cedrus atlantica) and Aleppo pines (Pinus halepensis), the authors analyze the widths of individual tree rings, following the basic principle that thinner bands indicate years when water was relatively scarce . Through this, they reconstruct the region’s Palmer Drought Severity Index, an index of dryness based on precipitation and temperature, for the years between 1456 and 2002. The reconstruction reveals the magnitude of droughts from the historic record, despite there having been no instruments to record these droughts. Interestingly, the most recent drought (1999?) appears to be the worst since at least the middle of the fifteenth century. This drought is consistent with early signatures of a transition to more arid midlatitude conditions, as projected by several climate models. (Bold added)

    Oh. Not temperaure?

    2. Leaving aside the statistics and looking at the conclusions of the paper, how reasonsonable does it seem to paleocimatologists that the global mean temperature over 900 years (1000-1900 AD) did not vary more than 0.15 °C plus or minus? Even if you think the 20th century is “contaminated”, it varies by more than 0.4 °C by the ’50s, so even before the heavy Carbon Age.

    3. In that regards, another recent paper cited in the AGU press release states:

    These reconstructions resolve the warming from the last glacial maximum, the occurrence of mid-Holocene warm period, a MWP and LIA, and the rapid warming of the 20th century, all occurring at times consistent with a broad array of paleoclimatic proxy data. The reconstructions show the temperatures of the mid-Holocene warm period some 1–2 K above the reference level, the maximum of the MWP at or slightly below the reference level, the minimum of the LIA about 1 K below the reference level, and end-of-20th century temperatures about 0.5 K above the reference level.

    So current temperatures may be a bit higher than the MWP (without conceeding the point) but fall well within the range of the past 10k years. Does that tell us anything?

    See the press release and citations here.

  77. Ron Cram
    Posted Aug 9, 2008 at 10:13 PM | Permalink

    re: 76
    Craig,

    I agree with you completely.

  78. Ron Cram
    Posted Aug 9, 2008 at 10:31 PM | Permalink

    re: 73
    John A,

    I do not see people in the field of climatology stepping up to the plate and swinging at these transparently invalid methods, although a lot has changed in climate science since the blog started, so who knows what will happen.

    I agree with your thought here. Science is supposed to be self-correcting and the history of science is full of stories of controversies and animosity among individual scientists as they argue for their own positions. We do not see that in dendroclimatology. bender’s comment that we should not expect to see this is completely off the mark, in my opinion. If I was a dendro and believed in my science, I would speak out against both incorrect methods and wrong conclusions that become wrong assumptions for the next researcher.

  79. Geoff Sherrington
    Posted Aug 10, 2008 at 4:02 AM | Permalink

    This epitaph does not help the science, but in an odd way it seems appropriate for final error bounds.

    Here lies Lester Moore.
    Three shots from a 44.
    No Less. No more.

  80. CA Fan
    Posted Aug 10, 2008 at 4:41 AM | Permalink

    I’d just like to extend a warm welcome to Caspar, Eugene, Gavin, Mike, Ray, Rasmus, Ray, Stefan, David, Thibaut, William and all the RC ‘team’. Also, supporters Josh H, Tim L, Michael T, Tamino, Ray L, Hank R, Lee, Dano et al. Don’t be shy – we know you are watching! We welcome your contributions to this discussion. Come and join the fun. We are missing you!

  81. MrPete
    Posted Aug 10, 2008 at 5:44 AM | Permalink

    With respect to bender (“Don’t expect celebration”) vs John A (“nobody is taking a swing at transparently invalid methods”)…

    I’m reminded of a hard lesson I learned years ago, when I had to do battle with a provably-wrong and over the top building code inspector. I took my complaint to The Boss. Even being as diplomatic as I could be, he still defended his employee. However, he also was diplomatic in telling me that I was gonna have to come up with an airtight case to get him to go against his own employee.

    It took a ridiculous amount of work (but this was my home and my sweetie’s new dream kitchen at stake 🙂 ), but I developed my case. The Boss saw the truth. The inspector went ballistic (sadly, in a public region-wide forum) and ultimately was let go.

    Bottom line: it’s a lot harder to effect change from the outside, because insiders must assume their coworkers are probably right. Imagine how demoralizing to discover that teammates are doing poor work!

    In that sense, this is why it is triply valuable when Steve is able to write things up for publication. The truth is no different when blogged or published, but the medicine tastes better swallowed from a GRL or Journal of Statistical Climatology spoon. (And yes, it is can be a royal pain to get the prescription approved 😉 )

  82. Steve McIntyre
    Posted Aug 10, 2008 at 7:42 AM | Permalink

    While we experimented in the early days with names for the Team – there were obvious suggestions like Flame and Heat – I sort of like Texas Sharpshooters. I think that their costumes should be more urban cowboy than rodeo, something along the line of the one below, the ersatz Bollywood interpretation capturing Ammann’s statistical style rather nicely, I think.

  83. Ron Cram
    Posted Aug 10, 2008 at 7:46 AM | Permalink

    Steve,

    I love your sense of humor! That’s hilarious!

  84. bender
    Posted Aug 10, 2008 at 7:51 AM | Permalink

    #82 Yes. Compared to other fields dendro is a quiet science that does not have a culture of progressing by “taking swings” at invalid methods. To expect it now is silly.

    #79 You misinterpret my remark, that’s why you disagree with it. There is more than one path to self-correction. The paths range from revolution to evolution. The bcp boondoogle was on its way to being debunked, with or without CA. [Yes, I’m sure you disagree.]

    You agree with Loehle’s #76 as though there was something revelatory in that statement. All dendros knows that their linear models are approximations. Yawn. So reconstructions have uncertainty in them. Does this mean a field is corrupt? Get a grip. And note Loehle’s careful choice of words. The fact is that soemtimes assumptions of moisture, temperature limitation are tested. Sometimes they do experiments. Sometimes they do independent sampling. Not everyone is addicted to uncalibrated bcps.

    #77 What part of “expect different responses on different sites” do you not understand? Treeline and desert are expected to experience temperature and moisture limitation respectively. Everyone knows this, but you pretend this is news. There’s probably a reason why people invent such straw men.

    Before pretending to be an authority on a subject the least you can do is read the blog and see if what you’re trying to say has been said better before.

    CA is at it best when it focuses on analysis and its worse when it devolves into wars of opinion.

    My point – to bring this back to the thread title – is that generating hypotheses requires some “texas sharpshooting”, aka a posteriori analysis. The hope is that you eventually go beyond this, to hypothesis testing. Dendros need to do more of that, as #76 (and Wegman) argues. But think about what this means – testing these infernal assumptions. Who cares to grant me $1M and 5000y to grow ancient bristlecone pines under controlled greenhouse conditions? Thought so.

    Does anyone here think about what it means to “test assumptions”? Or is the pile-on an involuntary uncontrollabe urge? There are reasons why these assumptions are often not tested.

    Please try to be more thoughtful in your criticisms.

  85. bender
    Posted Aug 10, 2008 at 7:53 AM | Permalink

    #83
    man, looks like he’s “wingin’ it”.

  86. Steve McIntyre
    Posted Aug 10, 2008 at 8:56 AM | Permalink

    #85. bender, for what it’s worth, I’ve found the thoughts of Greene on data mining in an econometric context useful in thinking about “a posterior analysis”. Like you, Greene notes that you form hypotheses from looking at data. Then the conundrum comes in whether you can also apply the data used in forming the hypothesis to proving the hypothesis. Greene observed (and I’ve used this illustration in presentations) that one way of testing an economics hypothesis, if time isn’t important, is to wait 30 years and see if it holds.

    I’ve observed that there are ideal circumstances to do this at very low cost in paleo, especially bristlecones. No need to wait 5000 years. Bring the Graybill chronologies up to date and see first if you can replicate them and second if they record recent global warming. We’ve proved the Starbucks Hypothesis – it is neither expensive nor time consuming to update the proxies.

    Given that we’re pretty much on the same page, I’m not sure why I’m belaboring the point. I’d better go watch the Olympics.

    As to the BSP boondoggle being on its way out – I don’t think that we can exclude the possibility that our criticisms may have ended up prolonging its life in the paleoclimate community. Mann’s PC1 has been used more by third parties AFTER the problems were identified than before; it’s as if the paleoclimate community is showing solidarity with Mann because he’s been criticized by outsiders. Sort of like tribal behavior all over the world, where cousins feud with cousins, tribes with tribes, but if a foreign invader appears, they forget their feuds. It’s understandable in human terms, but pretty pathetic when it’s endorsed by IPCC.

  87. Ron Cram
    Posted Aug 10, 2008 at 8:59 AM | Permalink

    re: 85
    bender,

    I am sorry you feel like people are piling on, but I am not sure you understand my criticism yet. Your assumption that bad methods will die from underuse is fine. It may even be true once the Team have all retired. But that hardly solves the problems of dendroclimatology. Sloppy methods have led to bad conclusions. Bad conclusions are now “science” and are assumed to be true by every researcher coming after.

    You say dendroclimatology is a “quiet science” and therefore not combative. That’s the problem! The dendros do not seem to understand this. The science will never be robust. It will never have integrity and command respect if they do not police themselves. If you want to make an omelet, will you use two good eggs and one spoiled egg?

    Yes, I understand the differences between treeline and desert and the limitations of temp and moisture. But these are not the only confounding variables. When a dendro can point to a tree ring and say “Based on this ring, formed in 1657, we know the annual temperature for the region, the precipitation, the natural fertilization rate and the amount of sunlight the tree received,” they have got something!

    According to the literature I’ve read, there is more than one way a narrow or wide ring is formed. A narrow ring could happen in a warm year with lots of precipitation, if it was a year the tree had to deal with insects or disease. All of the dendros know this, but they pretend it doesn’t matter. It does matter. Until someone challenges the assumptions, provides a public dunking to dendros doing to poor work, and otherwise cleans up their own house – dendroclimatology will never be science.

    It is time for the dendros to justify their existence.

    Steve: Ron, let me referee this little food fight a bit. I have no problem with dendros collecting data even if we’re not sure right now exactly what it means. Maybe patterns will emerge. In the scheme of things, it’s very cheap data to collect. Having said that, the very difficulties in interpretation place all the more onus on the dendros to archive their data in case a later interpreter can find a pattern that they can’t. For example, let’s say that a chronology is “screwed up” as a temperature record because of recurrent attacks of spruce budworm. Under Jacoby rules, that data gets thrown out because it doesn’t contribute to the story. But it would be just what the doctor ordered if you were studying spruce budworm patterns – and, who knows, maybe that might contribute to disentangling other information. The dendro data sets are big complicated data sets. They are interesting statistically and it’s too bad that bright young statisticians work on far less interesting data sets. If I’d been organizing the Paleo Challenge workshop, I’d have invited a lot of statistics grad students and post-docs as well as the dendros and then asked the dendros to describe their data, all the problems and issues; so that maybe some young statisticians with a clean slate looking for interesting problems would get interested.

  88. Kenneth Fritsch
    Posted Aug 10, 2008 at 9:04 AM | Permalink

    Re: #85

    My point – to bring this back to the thread title – is that generating hypotheses requires some “texas sharpshooting”, aka a posteriori analysis. The hope is that you eventually go beyond this, to hypothesis testing.

    Bender, I thoughtfully submit that part of the problem and frustration is not calling a conjecture a conjecture. Conjecturing isn’t bad within itself, but it is when called or attempted to be passed off as something else. Also data mining for hypotheses to test makes for complications when doing the statistical testing, and further, the “texas sharpshooting” appears to me to be different than mining data. How many scientific papers are accepted where the author(s) admit to “texas sharpshooting” and conjecturing and even more frustrating how often do reviewers of papers where these processes are used, but not labeled, label them as such?

  89. Dave Dardinger
    Posted Aug 10, 2008 at 9:13 AM | Permalink

    I went over to look at the link to the “Texas Sharpshooter Fallacy” at the top of the article and it reminded me of another, similar fallacy which may have a name but which I’m not aware of. I call it the “Busy Store Fallacy.”

    Say you get talking to the owner of a small store and she begins complaining about how bad business is. You retort, “Oh, come on! Almost every time I come here there are lots of customers.” The fact is that both of you can be right. There are two reasons you may think the store is busier than she does. First you, like most people, are likely only to come at certain times of the day or week. Thus the store is going to be busier when you come. But even beyond this, there are going to be natural clumps of customers. And if there are N people who come during a particular clump, then there are going to N people who see a busy store. There may be another M people who come in by themselves (between clumps), but N/(N + M) people are going to report a busy store even though it was really only busy once. Further there will be times, perhaps a large % of the time when there will be no customers and this will skew even more the opinions of the store owner vs a typical customer.

    It might be fun for someone to draw some graphs showing the % of people claiming a busy store vs the actual degree of business. Of course a definition of just what constitutes business would be necessary and it would also be necessary to make assumptions as to the form of the distribution of visits.

  90. Steve McIntyre
    Posted Aug 10, 2008 at 9:15 AM | Permalink

    Folks, as always, I urge people not to over-generalize. We have here a particularly egregious example of Texas Sharpshooting, one that interests me because I have a personal involvement in the dispute. And yes, Texas Sharpshooting is a problem with Team paleoclimate studies. But not everything in the world is Texas Sharpshooting. When people try to go a bridge too far, as readers often do, all it does is generate easy ripostes for critics of this site. They point to the exaggerated claim – one that I didn’t make – and then use that as an excuse not to consider the issue that prompted the post.

  91. bender
    Posted Aug 10, 2008 at 9:19 AM | Permalink

    good contrast in helpful vs. unhelpful rhetoric:
    #88

    It is time for the dendros to justify their existence.

    #89

    I thoughtfully submit that part of the problem and frustration is not calling a conjecture a conjecture.

    #89 is a balanced assessment. #88 is over the top.

    #87

    As to the BSP boondoggle being on its way out – I don’t think that we can exclude the possibility that our criticisms may have ended up prolonging its life in the paleoclimate community. Mann’s PC1 has been used more by third parties AFTER the problems were identified than before; it’s as if the paleoclimate community is showing solidarity with Mann because he’s been criticized by outsiders.

    Solidarity in support of a person is not the same thing as acceptance of an invalid method. The objective scientist will agree that the untenable PC1 demon (aka Chucky) needs to be exorcised. When the “young dendros rebel”, one of the things they rebel against is the unmitigated use of Chucky in support of policy. That the policy makers have made Chucky their favorite pet is not the young rebels’ fault; it is beyond their control.

  92. Steve McIntyre
    Posted Aug 10, 2008 at 10:28 AM | Permalink

    #92, bender, you’re being a bit unfair to the policy makers here. They are working with what they are given. Houghton’s backdrop at the WG1 press conference was the HS. Using the HS as a crutch seems to be common practice among concerned scientists.

    In my opinion, as I’ve said many times, I think that too many scientists under-estimate the public. If it were up to me (and I suggested this to IPCC AR4 scopers), they should go down the throat of all the technical issues – all the CO2 lines, all the water vapor lines, whatever. Put their best science in a place where people can look at it. Stop saying that its’s Met 101 or Atmospheric Radiation 101 or whatever. Or if there are text expositions that they endorse, they should state the endorsed expositions and discuss what a scientist from another field should look for in those texts.

    By using the HS and HS-type arguments, scientists have taken a bit of an easy road, treating the public like pawns. Their argument, I suppose, is that no harm is done, because even if the story for the public isn’t exactly right, there’s a story known to the illuminati that is correct and gives the same answer. IT’s a bit like saying it doesnt matter if WMD was wrong, because there was another good reason. Maybe so. But that obviously doesn’t justify the original use of flawed “facts” and the public eventually catches on to such things. The whole anti-MM thing feels exactly like that.

  93. bender
    Posted Aug 10, 2008 at 11:17 AM | Permalink

    you’re being a bit unfair to the policy makers here. They are working with what they are given

    You’re right; I need to clarify. It’s not the policymakers per se who are at fault, but the science promoters that sit in between the scientists and the policymakers. They are the ones who systematically suppress scientific uncertainty for fear of muddying the waters. I tend to call them “policymakers” although they are more like science-policy middlemen: senior science editors who don’t do science anymore and junior policy analysts. Yes, the promoters use what they were given in 1998, convenently don’t ask if anything’s changed in 10 years of research. Will Chucky resurface in, and survive beyond, the 5th assessment? If Chucky lives, it’s the middlemen who can be blamed. Is that unfair?

  94. Raven
    Posted Aug 10, 2008 at 11:27 AM | Permalink

    bender says:
    “Will Chucky resurface in, and survive beyond, the 5th assessment? If Chucky lives, it’s the middlemen who can be blamed. Is that unfair?”

    They have gotten away with it for 10 years so Chucky will not die unless the entire AGW premise is discredited by a climate systems addicted to chaos. Science self corrects but sometimes the old guard needs to die off first.

  95. jae
    Posted Aug 10, 2008 at 11:42 AM | Permalink

    Raven, 95:

    They have gotten away with it for 10 years so Chucky will not die unless the entire AGW premise is discredited by a climate systems addicted to chaos. Science self corrects but sometimes the old guard needs to die off first.

    I think the matter will finally be settled by straight physics.

  96. bender
    Posted Aug 10, 2008 at 11:59 AM | Permalink

    I think the matter will finally be settled by straight physics.

    “The matter” is whether CWP is “unprecedented” compared to MWP and HTO and PETM. “Straight physics” is not going to settle what is an empirical (and paleoclimatological) problem. That is, unless your “straight physics” includes time machines. In which case, you could be right.

  97. DeWitt Payne
    Posted Aug 10, 2008 at 12:34 PM | Permalink

    bender,

    I know that there are comparisons of the current release rate of CO2 into the atmosphere with that of the PETM, but surely no one is saying current global or polar temperature now are in any way comparable to those during the PETM. There were boreal forests in Antarctica then vs miles of ice now. Comparing the CWP to the MWP, Roman WP, Holocene optimum and Eemian and other recent interglacial temperatures are reasonable things to do, if the data were reliable. Did I miss the sarcasm tag?

  98. jae
    Posted Aug 10, 2008 at 2:06 PM | Permalink

    bender: Sorry I wasn’t clear. By “matter” I was only referring to the effects, or lack thereof, of the so-called GHgs.

  99. bender
    Posted Aug 10, 2008 at 2:18 PM | Permalink

    #98 The most relevant comparison is MWP vs CWP because this is the one paleoclimatology has the greatest probability of resolving. MWP is what the sharpshooters said must be erased and has been erased.
    #99 Which, as usual, has nothing to do with the topic, which is texas sharpshooting.

  100. Ron Cram
    Posted Aug 10, 2008 at 3:50 PM | Permalink

    Steve,
    re: your comment on my #88,

    I should not have used the more general term “dendros” when I was thinking specifically of the dendroclimatologists. I assumed everyone would know my meaning from the context. Of course, I agree with you that there is no problem in collecting data. I consider this the work of dendrochronology, which is not the same thing. And of course, more bright statisticians should be involved.

    My problem is with dendroclimatologists. They make claims for the science that are not supportable. This has all kinds of ramifications. I’m not just talking about global warming. Put yourself in the position of a young student who is convinced to get your Ph.D. in dendroclimatology. After getting your degree, you begin to evaluate all of the un-examined assumptions. You begin to see how bad methods have led to wrong conclusions and further research headed in the wrong direction. What do you do now? Some of these people may believe in the science and try to fight for it (but I haven’t seen one of these yet). Others will go along with the crowd because it is easy. Others will chuck it and quietly look for a new career. This is a terrible state of affairs.

    Steve, I understand that you may not agree with my comments. But anyone who attributes my comments to you are simply looking for a reason to discount your site so they do not have to deal with the facts. If it were not my comments, it would be someone else’s.

    Steve: I still think that you’re being too strident. Dendroclimatology originated out of the study of droughts in the Southwest and arguably they can accomplish something in that area. Ed Cook’s work on droughts seems sensible, for example. But they want to get to temperature and that’s where things get a lot hairier. Enter, in particular, Jacoby/D’arrigo and Briffa, who start down this road, path now being trod by D’Arrigo, Esper and Rob Wilson. Again, I encourage you to retain nuance.

  101. Ron Cram
    Posted Aug 10, 2008 at 3:54 PM | Permalink

    re:92
    bender,

    I stand by my statement that it is time the dendroclimatologists justified their existence. In fact, if I could find the time, I would like to follow Pat Frank’s lead and write an article for Skeptic on this very topic. Who knows? Maybe Pat will even agree to team up with me on the article.

  102. tty
    Posted Aug 10, 2008 at 4:23 PM | Permalink

    Re 97

    I would say that “the matter” is very easily settled with respoect to the HTO not to mention the PETM. The CWP is not nearly as warm. With respect to the MWP I think that the historical record suggests that the CWP is slightly cooler, but I’m not dogmatic about it.

  103. bender
    Posted Aug 10, 2008 at 4:52 PM | Permalink

    #103 Can you cite a statistical analysis that includes correctly estimated uncertainties? Does IPCC? No need to be dogmatic about any scientific proposition. If it’s a known fact, it can be shown with a citation; rhetoric is unnecessary. If it’s not a fact, or can never be proven, then it’s propaganda. You see that sometimes.

  104. Steve McIntyre
    Posted Aug 10, 2008 at 5:39 PM | Permalink

    I’ve added another example of Texas sharpshooting in my Replicating Ammann post, one that I just noticed. They do not calculate calibration RE and verification RE on the same series. When we had our Nature correspondence, Mann hyperventilated about us doing something VERY VERY WRONG in our calculation of verification statistics, by missing this strange and undocumented splicing procedure. Mann calculates the calibration RE on his “dense” network of 1000+ gridcells in the 1902-1980 period, but the verification RE on his “sparse” subset of 172 cells. What happens if you are consistent and at least, for the record, calculate calibration and verificaiton statistics on the same thing? Something that seems particularly desirable if, like Ammann, you believe that these two ratios have some meaning. Well, if you do this, the calibration RE falls to 0.177, failing Ammann’s redneck voter test.

  105. bender
    Posted Aug 10, 2008 at 11:47 PM | Permalink

    Another, older example of good old-fashioned rootin’ tootin’ sharpshootin’ cherrypickin’ pie:
    A few good men

  106. Posted Aug 11, 2008 at 12:52 AM | Permalink

    #105,

    There are some sparse vs. dense issues in figure 2 in MBH99 as well,

    http://www.climateaudit.org/?p=647#comment-103485

    Seems to me that Mann’s sparse reconstruction,

    http://www.ncdc.noaa.gov/paleo/ei/ei_data/nhem-sparse.dat

    is done without TPC variance matching, but dense

    ftp://ftp.ncdc.noaa.gov/pub/data/paleo/paleocean/by_contributor/mann1998/nhem-dense.dat

    is with variance matching.

    Variance matching affects REs, so we have even more choices 🙂

  107. KevinUK
    Posted Aug 11, 2008 at 4:32 AM | Permalink

    #100 Bender

    The MWP has not been removed. It is a matter of thankfully very well documented fact and so therefore despite the attempts of certain cherry picking ‘sharpshooting’ Team members can never be erased. They can continue to attempt to try to discredit it by referring to it as only having occurred in Europe or the Northern Hemisphere it will nontheless remain an historical fact. The purveyors of the new eco-religion can try to deny its existence as much as they want and the fact that there is more than enough historical evidence that it was at least as warm (more likely significantly warmer) during this period of the millenial past than it may have been during the recent warming period at the end of the 20th century.

    The attempts by the HT to remove it constitute the greatest example of ‘sharpshooting’ (or as we would say in the UK ‘moving the goal posts’) that has ever occurred to support an agenda in man’s history IMO.

    KevinUK

  108. Geoff Sherrington
    Posted Aug 11, 2008 at 5:37 AM | Permalink

    Steve # 83

    That cowboy sharpshooter picture will have to be erased. The man has his mouth open and that increases the probability of danger. He might be speaking.

  109. Francois Ouellette
    Posted Aug 11, 2008 at 2:26 PM | Permalink

    You’d think that if they sent out a Texas sharpshooter gunning for Ross and me, that they’d send out a guy that wouldn’t shoot himself in his own foot.

    Or, as one of my Aussie buddies once said: “He didn’t just shoot himself in the foot, he deepthroated a bazooka!”

  110. Raven
    Posted Aug 11, 2008 at 8:40 PM | Permalink

    My jaw dropped when I read this quote from Tamino:

    You really are confused. Cherry-picking is leaving out data because it implies an undesired result. Leaving out data because it increases error bars and obscures signal is not.

    IOW: “I know what the signal I want to see so anything that does not show it must be noise”. I always found hard to believe that the Team was intentionally manipulating data but I have often wondered how they managed to justify their actions in their own minds. I now know.

  111. bender
    Posted Aug 11, 2008 at 9:53 PM | Permalink

    #11 open mind => addled fundamentalism

  112. MrPete
    Posted Aug 11, 2008 at 10:21 PM | Permalink

    #111 — oh well. Tamino appears to have blocked my reply. I would love for him to explain how he knows what is “signal” vs “noise” in the raw data, when we’re dealing with unknown sources of “signal.”

    There appear to be some questions that Must Not Be Asked.

    (I’ve responded to dhogaza as well. I was saying precip and temp are not correlated, he shifted that to “not related”. Of course, there’s a complex relationship, as highlighted many times here. The hard part is how to describe growth as a function of temp without precip getting in the way as a separate variable… particularly in precip-limited places like the Nevada desert. Ah well.)

    It’s really all the same problem, when you get down to it.

    Goal: temp proxy.
    Reality: a variety of known and unknown factors influence the phenomenon being measured.
    Challenge: how to express the phenomenon as a function of temp and validly eliminate all other factors in the expression.

    If you can’t do that, then variance in the other factors affects your signal.

    AFAIK, what we’re seeing is people matching up their signal to a portion of modern temps, and assuming that means they’ve eliminated the other factors.

    Kinda big assumption.

  113. Raven
    Posted Aug 11, 2008 at 10:58 PM | Permalink

    #113 – He refused to post my reply too. I must give the guy some credit for being a master of propoganda because he recognizes when posters are bringing up issues that he cannot possibly reply to so he accuses the poster of smearing ‘honest scientists’. This gives him the cover he needs to delete further posts on the topic. I am thinking of proposing a web blog award for the most ironic blog title – tammy would win hand’s down.

  114. Lost and Confused
    Posted Aug 12, 2008 at 1:05 AM | Permalink

    I have never posted on Climate Audit before, and I had not intended to either. I read this site fairly frequently, but I never had much to contribute. Considering people here are obviously interested in what I effectively caused, I thought I would post a comment here. I submitted a post a short while ago which is still in the moderation queue. Two of my posts before it disappeared, and I imagine it will too. (I am not particularly fond of shadow posting, but in this case I want some record.)

    I have just had two posts disappear from the moderation queue. I have had posts get caught by an automatic filter on here before (due to links), but it seems extremely unlikely that is the case now.

    Assuming they were deleted intentionally and not by some fluke, I have to say I am dumbfounded. Both posts were made only to respond to Tamino’s response to me, and deleting them is dishonest. The first post made an “accusation” in that I repeated Jacoby’s own claims and interpreted them. Perhaps that is a basis for it being deleted, so I made a second one. That one asked a hypothetical question, in which I demonstrated the problem with Tamino’s response to me. There was no accusation, insult or even bad language.

    Once before I had a post deleted in which I defended myself when people said I accused scientists of fraud. I explained the significance of accusations of fraud, and showed I had never made such claims. This seems to be something along the same line. If my posts will be filtered just so I am misrepresented, there is no point in me posting here. Routinely accusing “denialists” of cherry picking, then quietly deleting posts causing me to be misrepresented is really, really ridiculous.

    Unfortunately I did not make copies of the deleted posts. The best I can manage is a paraphrase of the second deleted post. The wording is different, but the content should effectively be the same.

    This intrigues me. Imagine if you had 100 temperature records (hypothetical ones with no microsite issues, UHI or the like). Thirty stations have a signal demonstrating global warming. Seventy have no signal. Is it cherry picking to only discuss the 30 while throwing away (not even archiving) the other 70?

    I wholeheartedly support open and honest dialog. I attempted to have it at Open Mind, but it seems that will be impossible. I have now had several posts deleted without any explanation. These posts did not violate any rules of the website, nor contain inappropriate material. These deletions have effectively misrepresented me. I apologize for spending bandwidth on this as I am not sure it belongs here, but I thought it might be of some interest. And to be honest, I feel some urge to defend myself.

  115. Raven
    Posted Aug 12, 2008 at 1:34 AM | Permalink

    #115 – Here is the post I had deleted. As you can see the is nothing rude about the tone other than the fact that I stated quite strongly that I disagreed with his opinions.

    Tamino says:
    “Cherry-picking is leaving out data because it implies an undesired result. Leaving out data because it increases error bars and obscures signal is not”

    This is only makes sense if there is a robust physical explanation for why the data in question is noise rather than the real signal. Assuming that you know what the signal is supposed to be and then coming up hand waving excuses for rejecting everything that contradicts it as “noise” is cherry picking whether you want to admit it or not.

    In fact, this is exactly the criticism that Dr. Svaalgard levels at most of the solar fans engaging in wiggle matching exercises. I happen to think that Dr. Svaalgard has a point which I why I agree with him on GCR-climate link and disagree with you on pretty much anything you write about tree-rings and climate reconstructions.

    I would modify you comparison to state:

    Imagine if you had 100 temperature records with no metadata describing the siting or station moves. Thirty stations have a signal demonstrating global warming. Seventy have no signal. Is it legimate to only discuss the 30 while throwing away (not even archiving) the other 70 because the ones not showing warming are ‘obviously’ contaminated by microsite issues?

    Steve may need to correct me if I am wrong but I believe that is exactly what the Team is doing with their samples.

  116. Lost and Confused
    Posted Aug 12, 2008 at 2:30 AM | Permalink

    I dislike your changes Raven. They greatly modify the meaning. It may make the comparison more accurate for this thread, but not for anything that was being discussed where this originated.

  117. trevor
    Posted Aug 12, 2008 at 3:01 AM | Permalink

    I am thinking of proposing a web blog award for the most ironic blog title – tammy would win hand’s down.

    In the irony stakes I think it is a close run thing between Open Mind, Real Climate and Fair and Balanced.

  118. kim
    Posted Aug 12, 2008 at 6:49 AM | Permalink

    Tamino has selectively deleted my posts in order to misrepresent me. I quit posting there in frustration. That mind is open indeed, and the sight is not pretty.
    ==================================================================

  119. bender
    Posted Aug 12, 2008 at 7:29 AM | Permalink

    To keep an open mind you have to sharpshoot denialist postings. Credit to the sharpshooter for recognizing devastating critique.

  120. Francois Ouellette
    Posted Aug 12, 2008 at 7:51 AM | Permalink

    #118: Remember what “Pravda” means? “The truth”

  121. Posted Aug 12, 2008 at 8:25 AM | Permalink

    Re #115

    You have to be very careful posting to Tamino’s site, it seems to have a very aggressive Spam checker which consigns posts to some imaginary list, from which they never return, for containing the names of certain drugs!
    I had the experience of having a post multiply rejected, detailed investigation revealed it was because I used the word ‘ambient’! What’s wrong with that you ask? It was rejected for containing the name of the drug ‘Ambien’!

  122. Lost and Confused
    Posted Aug 12, 2008 at 8:29 AM | Permalink

    Phil, three different posts of mine have vanished in less than 12 hours. Immediately after one post was accepted. The three posts which were deleted each showed Tamino to be wrong, while the accepted one barely said anything. There can be no doubt Tamino intentionally deleted posts that would be damaging to him.

    snip

  123. bender
    Posted Aug 12, 2008 at 8:51 AM | Permalink

    OP:

    The bulls’ eye was re-drawn in the next step – where Ammann proposed a “conservative” ratio of 0.75 between the calibration RE and verification RE statistics. Using this “conservative” ratio, he threw out 419 out of 1000 votes. The salient question is whether this “conservative” procedure has any validity … I’ll provide some details below and you decide.

    I have decided. This is the very definition of sharpshooting – drawing your target (i.e. benchmarking) after you have shot (ie. run your data through the analytical sausage-grinder and previewed the outcome). Some would call it “cherry-picking”. However these cherries weren’t hand-picked individually. They were selected algorithmicially, which helps sustain the illusion of “hands-off”.

  124. MrPete
    Posted Aug 12, 2008 at 11:05 AM | Permalink

    Phil, Tamino’s site tells you if your post is waiting for the moderator. If it is, Tamino is the one deciding what stays, what goes. Here’s my current contribution, waiting for moderation. I have better things to do than try to poke through a one-sided moderation queue and do safety-postings here. Time to get back to the Real World.

    In deference to our gracious host’s desire for brevity, here’s a shorter response to dhogaza:

    I wrote about lack of correlation.
    dhogaza writes about relationship.
    We agree more than is suspected.

    Precip and temp are not (statistically, mathematically, etc) correlated. They do have a complex relationship, e.g. very high temp results in lower precip.

    Hard part: growth = f(temp, precip, etc). Can’t factor precip out of the equation because precip is not a function of temp.

    To use growth data as a temp proxy, you need such an equation. Doesn’t exist. Big problem.

  125. Posted Aug 12, 2008 at 3:37 PM | Permalink

    2 questions, Steve —

    First, what is the actual percentile for validation RE = 0, if not .99? I suppose this varies somewhat with the MBH “step” involved.

    Second, how do WA get negative calibration RE’s? According to the NAS North report cited by Fred above in #60, “in the calibration period, RE, CE and r^2 are all equal.” But the calibration r^2 (aka the regression R^2) must be nonnegative, and can only be zero with probability zero. The adjusted R^2 can be negative, but its expectation under the null of no explanatory power is 0, so it’s negative about half the time, not 7/1000 of the time as here.

    Is the discrepancy because WA (and MBH) are in fact computing RE in terms of temperature errors and mean temperatures, even though MBH in effect are regressing the proxies on temperature? Regressing proxies on temperature as in MBH is the correct way to go, even if they inefficienty ignored the covariance matrix across proxies and therefore could not compute correct confidence intervals. However, it would then be more appropriate to measure the ability of the model to fit the proxies in the validation period rather than its ability to back out temperature, as they apparently have done.

  126. Kevin
    Posted Aug 12, 2008 at 4:16 PM | Permalink

    What a long thread, thanks for the good work.

    Please friends, stop debating dendro with bender. New ideas are not being added, just repeated.

    If the hockey stick is a phony due to low statistical significance, what is the next best chart. I mean, is there better data from Ice cores or silt layers or something?

  127. Pat Keating
    Posted Aug 12, 2008 at 4:35 PM | Permalink

    127 Kevin
    How about Craig Lohle’s?

  128. Kevin
    Posted Aug 13, 2008 at 6:43 AM | Permalink

    Just read “A 2000-YEAR GLOBAL TEMPERATURE RECONSTRUCTION
    BASED ON NON-TREERING PROXIES”.

    Thanks Pat, thats very good. Lying is a strong word, but somebody seems to be on the borderline. Now I must go to some site with the opposite argument. One where the hockeystick is valid and McIntyre et al are cast as paid oil thugs. Are there any good (science not politics) ones out there?

  129. UK John
    Posted Aug 13, 2008 at 12:43 PM | Permalink

    #129

    You could try the UK Met office web site, they have consulted their wet seaweed, and all seems to be OK with the hockey stick.

    Mind you their forecasting record is appalling, any UK Met office forecast beyond to-morrow, you would be advised to treat with caution. Such is the nature of the UK climate.

    This summer is appalling, we are meant to be in severe drought, that is what the climate change models predicted in 2006, and it has rained ever since that forecast was made.

  130. MrPete
    Posted Aug 13, 2008 at 1:26 PM | Permalink

    Interesting the amount of data being pulled in at Tamino’s to suggest that other factors are not independent sbBCP growth factors. Perhaps we can create a catalog one of these days. My latest response (it’s getting easier to be brief):

    Trying to stay close to our ever-patient host’s topic: what does the data tell us. Thus, I’m not interested in loosening my concrete assertion about correlations.

    Nth time: strip bark BCP growth = g(precip,temp,etc). Where is precip=f(temp) [let alone etc=f(temp)] that allows us to eliminate non-temp factors?

    dhogasa: Epochal changes in (precip,temp) do not answer. Useless to remove precip as a growth factor.

    LB: yup, nice refs! They show: tree rings help see ENSO precip record, fire, etc. (Frost kill too, BTW.) Just not growth as f(temp). I wish it were so! ‘Twould be much simpler.

    Find a way to remove precip as a growth factor in the equation. I’ll not hold my breath. In the meantime, Lazar is not doing a temp analysis as long as precip (and etc) is left in there.

    Remember, this is the easy part. Intra-tree data with 300%+ growth excursions is harder. (cf Almagre tree 31)

  131. Steve McIntyre
    Posted Aug 13, 2008 at 6:04 PM | Permalink

    The related big issue with respect to bristlecones is reconciling the Ababneh and Graybill results. I don’t see how anyone can use the Graybill Sheep Mt results without showing why Ababneh’s results are wrong. And if Ababneh is right about Sheep Mt, then there’s something almost certainly wrong with Campito Mt and other related Graybill sites.

    Graybill’s own analysis in Graybill and Idso shows a big difference between the strip bark and whole bark chronologies. You and I have a pretty good idea what’s wrong with the strip bark – but it’s crazy that the dendros haven’t analyzed exactly what’s going on with strip bark – even after the problem was highlighted by the NAS panel. You’d think that someone would do a technical report.

    But nope, the same tired old re-hashing of Graybill chronologies, which, aside from any other consideration, should be regarded as unusable without re-confirmation simply due to age, missing records and inconsistency with Ababneh.

  132. bender
    Posted Aug 13, 2008 at 6:09 PM | Permalink

    #132 Agree completely. Choosing Graybill over Ababneh is akin to selecting “a few good men”, i.e. cherypicking.

  133. Steve McIntyre
    Posted Aug 13, 2008 at 7:32 PM | Permalink

    #133. The “few good men” thing goes beyond cherry picking when it infects the archiving and they only archive “good results”.

    Remember that they withheld the Gaspe updated results that didn’t show the HS, which have never been archived or published. I learned of their existence and, when I asked for them officially, Jacoby refused, saying that the earlier results showed the “signal” better. Once they go down the road of selective archiving, unfortunately it ends up eroding confidence in the representativeness of what they do archive.

  134. MrPete
    Posted Aug 13, 2008 at 7:50 PM | Permalink

    Steve, you’ve read all these papers. When it is said that we’re selecting for a “good signal” have you seen any definition of what “signal” means? Trying to be as open as possible to whatever process is in place.

  135. bender
    Posted Aug 13, 2008 at 8:16 PM | Permalink

    #134 True.
    #135 “Signal” means correlation with the instrumental record of whatever is being proxied.

  136. Steve McIntyre
    Posted Aug 14, 2008 at 7:43 AM | Permalink

    Except that they now purport to disdain “interannual” correlation – the despised r2 statistic. There’s a tendency to assume ex ante that the HS is the “signal”.

  137. bender
    Posted Aug 14, 2008 at 12:29 PM | Permalink

    #137 Really? They disain interannual correlation? What makes you say that? I hope they’re prepared to start dealing
    with monstrous levels of autocorrelation. It’s really “pick your poison”.

  138. Sam Urbinto
    Posted Aug 14, 2008 at 2:09 PM | Permalink

    Craig:

    I was saying precip and temp are not correlated, he shifted that to “not related”.

    Isn’t that a usual tactic? Saying there’s no clear cause/effect relationship in the first plae, much less one with directionality turns into the person saying there’s not a relationship at all.

    We know carbon dioxide is related to temperature.

    “You’re stupid, Arrhenius showed increasing carbon dioxide increases temperature in 1896!”

    I wonder if some people have ever actually read “On the Influence of Carbonic Acid in the Air upon the Temperature of the Ground”.

    Steve:

    I’d have invited a lot of statistics grad students and post-docs as well as the dendros and then asked the dendros to describe their data, all the problems and issues; so that maybe some young statisticians with a clean slate looking for interesting problems would get interested.

    I wonder why people in the field seemingly aren’t interested in something helpful like this.

  139. Mark T.
    Posted Aug 14, 2008 at 2:37 PM | Permalink

    Because it is career suicide.

    Mark

  140. Geoff Sherrington
    Posted Aug 14, 2008 at 6:15 PM | Permalink

    Don’t know where to put this, Unthreaded is closed, but I think it needs saying.

    A longtime friend suicided by hanging the other day. His father had been an eminent medico, much decorated, and the son tried for recognition too. He chose a path of protest and was active at anti-uranium and human rights protests.

    We shall not know the full story, but it is plausible that he ended it because of increasing mental agitation that not enough was being done to combat his acquired view of Global Warming.

    In this difficult week, I have appreciated more than ever the leadership of Steve McIntyre and his balanced, interesting writing; and to name just one more of many, Steven Mosher for his delightful turn of phrase coupled with evident knowledge. You have all provided a refuge.

    It was not sharpshooting, it was the rope. And it was a day or two after I wrote lightly about the epitaph of Lester Moore. There are strange, sometimes savage, twists and coincidences dished out by Mother Nature, not just climate-wise.

  141. Jonathan Schafer
    Posted Aug 14, 2008 at 9:56 PM | Permalink

    #141,

    My condolences on the loss of your friend. May he rest in peace.

  142. Geoff Sherrington
    Posted Aug 17, 2008 at 3:00 AM | Permalink

    Thank you, Jonathan. Geoff.

  143. jeez
    Posted Aug 17, 2008 at 3:10 AM | Permalink

    I think you’re probably getting silent condolences from many of the posters. People often don’t know how to react when confronted with such events.

    My condolences as well.

  144. TAC
    Posted Aug 17, 2008 at 4:14 AM | Permalink

    #141, Jeez has expressed my sentiment. It is hard to know what to write. Words on a blog are inadequate and seem vaguely inappropriate.

    My condolences on the loss of your friend.

  145. bender
    Posted Aug 17, 2008 at 5:53 AM | Permalink

    I have several good friends who are depressed by the thought of AGW = “man’s ruination of the planet”. Anyone “sounding the alarm” had better be aware of what they could be triggering. A false alarm could prove very costly. (Yes, I know – a failure to alarm in the case of a disaster would be costly too. A familiar message.)

    Let’s just get the science right, and let’s talk openly about what the data really say. (The GCMs in particular, and the forcing estimation exercise especially. (The GCMs aren’t the big problem. It’s the estimation exercise.)) RC is too autocratic a forum to be trusted. They are under no obligation to answer any question, and are unnaccountable for the answers they give. Their mandate is clear: subdue all questions that may threaten policy momentum. Dudes, it’s the science. Folks want answers. Some folks need them.

  146. steven mosher
    Posted Aug 17, 2008 at 8:15 AM | Permalink

    re 141

    It’s says a lot about man when he reaches out to thank others in his time of grief. CA has also been a refuge for me.
    My condolences on your loss.

  147. Ron Cram
    Posted Aug 17, 2008 at 8:18 AM | Permalink

    Geoff,

    My condolences as well. Words fail me.

  148. MrPete
    Posted Aug 17, 2008 at 12:00 PM | Permalink

    Wow. Loss of hope is always tragic. Geoff, may you have opportunity to give an overwhelmingly good hug to the family.

  149. Geoff Sherrington
    Posted Aug 18, 2008 at 5:09 AM | Permalink

    Thank you, Steve, for allowing this space which can be closed off now, and to others for kind comments. I have had cause for some deep thought and have resolved never again to use the ad hom, unless it is clearly seen as light and humorous. Different people have different sensitivity to games. Some at RC might take note of such sensitivity.

  150. TerryB
    Posted Aug 18, 2008 at 7:12 AM | Permalink

    I might be going mad but I’m sure I saw Steve Mc say something about poetry in a recent thread???
    (I don’t know which thread so I’ll put my ridiculously poor effort in a couple of threads. Sorry!)

    There was a doctor called Mann,
    But his data got him into a jam,
    He said the millennium was cool,
    But he was shown as a fool,
    ……Then to the rescue came Casper Ammann.

    But McIntyre wouldn’t let it be,
    He wanted to see the RE,
    And when it eventually came,
    The statistics were lame,
    But ’twas too late for the IPCC.

    McIntyre’s instinct was right,
    Ammann should be feeling contrite,
    But he’s probably not,
    He’ll still say it’s hot,
    And the Team will no doubt put up a fight.

    So was it as warm in the past?
    Do we really need to act fast?
    Who knows the real truth?
    But in their search for a proof,
    The Team’s antics leave us all aghast.

    The actions of just a few,
    Might probably destroy peer review,
    They don’t follow the rules,
    Treat the curious like fools,
    And make honest scientists cry “mon Dieu!!”

    So who should we really believe
    If it wasn’t for good blokes like Steve?
    We should keep open minds,
    And eventually find,
    That climate science might be reprieved.

    I’ll give you this poem for free,
    And in time we’ll eventually see,
    That when the science is sound
    Steve will eventually have found,
    How CO2 raises temperature by 3.

    (Not worth a copyright, Terry B August 2008)

  151. gda
    Posted Aug 22, 2008 at 11:07 AM | Permalink

    After a long period of (stunned?) silence, I note that supporters of the Team over at Tamino appear to think they have dismissed Steve’s demolition of W&A.
    http://tamino.wordpress.com/2008/08/10/open-thread-5-2/#comment-21340
    So does their argument really have any merit, or is it just the usual handwaving?
    BTW – first time poster, long time lurker – love your blog Steve, even if I struggle to follow some of the maths!

  152. Sam Urbinto
    Posted Aug 22, 2008 at 1:49 PM | Permalink

    Well, the NAS panel said, two issues were brought up “regarding the metrics used in the reconstruction exercise”:

    1. “…the choice of ‘significance level’ for the reduction of error (RE) validation statistic is not appropriate.”
    2. “…different statistics, specifically the coefficient of efficiency (CE) and the squared correlation (r2), should have been used….”

    And said about them:

    Some of these criticisms are more relevant than others, but taken together, they are an important aspect of a more general finding of this committee, which is that uncertainties of the published reconstructions have been underestimated.

    As I might put it plainly, inappropriate validation metrics were used, so the results of the reconstruction are far more uncertain than claimed. Or in other words, they didn’t prove what they said they did.

  153. Steve Reynolds
    Posted Aug 22, 2008 at 4:37 PM | Permalink

    Interesting article on an open-science movement in which raw data are being posted on the Internet to speed research:

    http://www.boston.com/news/local/massachusetts/articles/2008/08/21/out_in_the_open_some_scientists_sharing_results/

  154. John F. Pittman
    Posted Aug 22, 2008 at 5:11 PM | Permalink

    #152 gda

    1. y = ax + b + noise (1).

    2. y = x + noise (2).

    Note that there is some confusion here from Gavin’s pussycat. Eq 1 = Eq 2 when the intercept is 0.0 and a is 1. One would assume that noise would meet the definition of noise. Equation 1 is the general form. Eq 2 is a specific form of Eq 1. R^2 still is good for both. Hard to see where Steve McI, according to pussycat,

    is being was pretty sharp — evil, dishonest but sharp

    by using a general form. Without showing proof that this specific form of Eq 2 somehow should not use R^2, the claim made is more than just suspect. Quoting from pussycat

    understanding tests like r2 are “lightweight” tests, typically only used as a first cut at deciding if “there is something to it”.

    Let’s see can’t pass a “lightweight” test that is used as a first cut at deciding. Yet the MBH claims “robustness” or some such.

    Would like to see that proof of this claim

    It’s also fairly easy to cheat

    in the acknowledged statisticians peer-reveiwed literature; and of course how MBH fit the specific and not the general case. It would be interesting. Of course, this is claim of ease is not mine nor MBH, as far as I know.

    Not to doubt his claimed ability to cheat. Just that I was taught that this was one of those “lightweight” tests used to help make sure you could support the claim you were making.

  155. Gerald Machnee
    Posted Aug 22, 2008 at 6:01 PM | Permalink

    Re #152 gda – Whether some of the posters at Tamino’s site have the expertise does not matter. Too many at that site speak from a biased point of view. I would stick to Steve M and Dr. Wegman.

  156. Ryan O
    Posted May 25, 2009 at 4:02 PM | Permalink

    I suppose it’s about time to resurrect this thread. 🙂 After having done some of the Steig analysis, I re-read AW & WA and figured I’d post my thoughts. Not necessarily connected to the discussions immediately above.
    .
    Steve,
    .
    My biggest issue with their 2 papers is not the methodology or even the statistics. It’s the whole-body-waving when it comes to the multivariate aspect of the analysis. I will quote a relevant passage from AW (my bold):
    .

    It is important to note in this context that, because they employ an eigenvector-based CFR technique, MBH do not claim that all proxies used in their reconstruction are closely related to local-site variations in surface temperature. Rather, they invoke a less restrictive assumption that “whatever combination of local meteorological variables influence the proxy record, they find expression in one or more of the largest-scale patterns of annual climate variability” to which the proxy records are calibrated in the reconstruction process (Mann et al., 2000). MM directly note the link between bristlecone/foxtail pines and precipitation (p. 85, MM05b), which is exactly the kind of large-scale pattern registration that the MBH CFR method takes as axiomatic because large portions of this region are known to have important ENSO/precipitation teleconnections (cf. Rajagopalan et al., 2000; Cole and Cook, 1998). Since ENSO has a strong role in modulating global temperatures as well as affecting regional precipitation patterns, a CFR method of temperature reconstruction can effectively exploit regional ENSO/precipitation teleconnections that register in proxy data.

    .
    Without being melodramatic, I am seriously struggling to see how any peer reviewer would not demand that this entire paragraph be stricken prior to publication. Did MBH “calibrate” the regional precipitation/ENSO teleconnection (note that WA juxtaposed the precip and ENSO, rendering their statement meaningless) to global temperature? No. Has Mann et al., WA, or anyone else done this subsequently? No. Has anything been published anywhere that suggests that such a generic calibration is even remotely possible? No.
    .
    There is also an untrue statement (the first bolded statement) about the MBH methodology. The calibration step is not to some amorphous “large scale pattern of climate variability” (whatever that means); the calibration is to temperature. The MBH method does not use bristlecones as ENSO proxies (which is of doubtful plausibility in the first place), precipitation proxies, CO2 proxies (except by accident), or nitrogen proxies. It directly uses bristlecones as temperature proxies. It is not possible to calibrate a proxy to temperature and then claim that it is accurately representing ENSO via an unproven precipitation teleconnection – which, by the way, just happens to be an unquantified proxy for global temperature. Calculate the confidence intervals on that.
    .
    The middle statement, that this is exactly the kind of thing that MBH takes as axiomatic – rather than supporting the MBH-type analysis – should be considered a devastating condemnation of the MBH analysis. None of the chain connecting local site variations to large-scale climactic patterns and then to global temperature has ever been quantified, and it is not clear that it is even possible to do so. In essence, this sentence admits that the proxies are not good temperature proxies. Instead, they are good proxies for whatever makes tree ring widths change in size, and this is then defined by MBH as representing a climactic signal with zero supporting evidence and no means to quantify this relationship in terms of temperature.
    .
    I’ll stop there for the moment. 🙂

  157. Mike B
    Posted May 26, 2009 at 1:50 PM | Permalink

    Ryan, I so enjoyed your comments, that I felt I needed to “bump” them.

    Someday, I hope it is in my lifetime, serious scientists are going to wake up and call MBHxx, WA, and AW what they really are and ask, “what were we doing? Why did we put up with this?”

  158. anonymous
    Posted Jul 16, 2009 at 4:33 PM | Permalink

    What happens if you apply all the “rules” used (for throwing out sample data in order to get the “99%”) to the actual MBH proxies themselves ?

    Would there be any proxies left in the reconstruction ?

    Also, i’m guessing that by running this 0.75 ratio cutoff over any non-strongly-rising part of the callibration to the temperature record would give fantastically different results. Why would changing the region of callibration affect the validation statistics to such a high degree ?

  159. Wayne Richards
    Posted Jan 27, 2010 at 1:39 AM | Permalink

    Steve, whenever I read one of your critiques of The Team’s math, I become 99% certain that these – snip –
    Robustly!

2 Trackbacks

  1. […] Caspar Ammann, Texas Sharpshooter By Steve McIntyre    “The Texas Sharpshooter fallacy is a logical fallacy where a man shoots a barn thirty times then circles the bullet holes nearest each other after the fact calling that his target. It’s of particular concern in epidemiology. Folks, you are never going to see a better example of the Texas Sharpshooter work itself out in real life than Caspar Ammann’s handling of Mann’s RE benchmark. I introduce you to Caspar Ammann, the Texas Sharpshooter. Go get ‘em, cowboy.” […]

  2. […] We have a strong suspicion that this is the case, but, of course, no proof because we do not know *who* the reviewers of these papers have been. This was the charge made against those editors who published the articles the CRU gang produced. They refused to disclose the reviewers. The emails detail how they made sure “appropriate” reviewers were provided, knowing they would not be revealed. Perhaps now is the time to make this a direct accusation and request (or demand) that this information be made available. They don’t seem to realize this would expose their malfeasance. In order to properly defend the good science it is essential that the reasons for bad science appearing in the literature be investigated. Frightening comment, because only they know what is “good science” and you bully “bad science” by personal attacks. The lever here is that the Subcommittee on Oversight and Investigations of the House Committee on Energy and Commerce is suggesting that your papers are bad science and asking (their point 8e) for the identity of people who reviewed your work. The Committee is investigating the charge they were peer reviewing each other’s work, which was confirmed by the Wegman report. In response, it is completely fair and justifiable to point out that it is the papers that criticize your and related work that are bad science, and that, through the Subcommittee you can request the identities of the reviewers of all of these critical papers—starting with M&M. Amazing! When you respond, there are a number of items that require a direct response from you alone. There are also a number of scientific points where you could give a multi-authored response. Safety in numbers and whose names should appear as authors is a game documented in the emails. Multiple authors appear on many of their articles. There are many people who have expertise in this area and familiarity with the scientific issues who I am sure would be willing to join you (I would be happy to do so). At this stage, however, I would keep the group small. This appears to indicate an awareness of keeping control of the issue. A few others could be added to the original email list nevertheless. I took the liberty of copying your plea and the Subcommittee’s letter to Caspar Ammann, primarily because I think he can help with the scientific aspects better than most people. Amman later tried to ‘help’ but ended up right in McIntyre’s sights and likely regretted getting involved. […]