Principal component analysis

I just finished reading Michael Mann's The Hockey Stick and the Climate Wars, and was interested to find that he spends quite a bit of space attempting to give a non-technical explanation of principal component analysis (PCA). I am not sure how successful he is, but he tries hard.

I understand what PCA is. Can someone explain with more mathematical detail how how PCA is used in Mann's work and similar paleoclimate reconstructions? (John, perhaps this is a topic for your graduate course!)

John Roe

Comments

  • 1.

    You can find a bit of a review in Smith (2010), although it's not completely explicit about Mann's original algorithm. Actually, there are many PCA-based reconstruction methods, that take different approaches (which are now often being supplanted by Bayesian spatiotemporal models).

    Comment Source:You can find a bit of a review in [Smith (2010)](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.176.3465), although it's not completely explicit about Mann's original algorithm. Actually, there are many PCA-based reconstruction methods, that take different approaches (which are now often being supplanted by Bayesian spatiotemporal models).
  • 2.
    edited October 2012

    Hi John Roe - I have long wanted to explain how PCA is a subclass of the method known as Matrix Product States (MPS)! As far as I understand, PCA is another name for the singular value decomposition SVD. It allows one to discard either very large, or very small singular values and their vectors. The MPS method is a better compression as it works recursively.

    I don't know anything about the application of either of these methods to the climate. I'm interested in learning about this and talking more. I'll have a look at the links you sent, but I just wanted to chime in that I'm interested.

    Cheers from Qutar!

    Comment Source:Hi John Roe - I have long wanted to explain how PCA is a subclass of the method known as Matrix Product States (MPS)! As far as I understand, PCA is another name for the singular value decomposition SVD. It allows one to discard either very large, or very small singular values and their vectors. The MPS method is a better compression as it works recursively. I don't know anything about the application of either of these methods to the climate. I'm interested in learning about this and talking more. I'll have a look at the links you sent, but I just wanted to chime in that I'm interested. Cheers from Qutar!
  • 3.
    edited November 2012

    Qutar or Qatar? Either way, what are you doing there, Jake?

    Can someone explain with more mathematical detail how how PCA is used in Mann’s work and similar paleoclimate reconstructions? (John, perhaps this is a topic for your graduate course!)

    Alas, I know nothing about that. All I know is that Richard Muller, now head of the BEST (Berkeley Earth Surface Temperature) project, questioned Mann's use of PCA. He wrote:

    Now comes the real shocker. This improper normalization procedure tends to emphasize any data that do have the hockey stick shape, and to suppress all data that do not. To demonstrate this effect, McIntyre and McKitrick created some meaningless test data that had, on average, no trends. This method of generating random data is called “Monte Carlo” analysis, after the famous casino, and it is widely used in statistical analysis to test procedures. When McIntyre and McKitrick fed these random data into the Mann procedure, out popped a hockey stick shape!

    That discovery hit me like a bombshell, and I suspect it is having the same effect on many others. Suddenly the hockey stick, the poster-child of the global warming community, turns out to be an artifact of poor mathematics. How could it happen? Let me digress into a short technical discussion of how this incredible error took place.

    In PCA and similar techniques, each of the (in this case, typically 70) different data sets have their averages subtracted (so they have a mean of zero), and then are multiplied by a number to make their average variation around that mean to be equal to one; in technical jargon, we say that each data set is normalized to zero mean and unit variance. In standard PCA, each data set is normalized over its complete data period; for key climate data sets that Mann used to create his hockey stick graph, this was the interval 1400-1980. But the computer program Mann used did not do that. Instead, it forced each data set to have zero mean for the time period 1902-1980, and to match the historical records for this interval. This is the time when the historical temperature is well known, so this procedure does guarantee the most accurate temperature scale. But it completely screws up PCA. PCA is mostly concerned with the data sets that have high variance, and the Mann normalization procedure tends to give very high variance to any data set with a hockey stick shape. (Such data sets have zero mean only over the 1902-1980 period, not over the longer 1400-1980 period.)

    The net result: the principal component will have a hockey stick shape even if most of the data do not.

    Of course after doing a lot more work Muller now agrees with one of Mann's key conclusions: the Earth is warming up. But I've never read a good discussion of the methodological point raised by Muller. In the online conversations I've seen, everyone is too passionate and argumentative for me to trust them: people who believe in global warming defend Mann, and people who don't attack him.

    I know a microscopic amount about a related but different topic: the use of principal component analysis to study 'teleconnections' in the Earth's weather. The idea of a teleconnection is that the weather at one place and time on Earth may be strongly correlated to the weather at a remote place and perhaps a shifted time. Teleconnections are quite fascinating. There are lots of them, but the most famous is the El Niño/La Niña cycle, more technically called the ENSO. William Kessler mentions two theories of how an El Niño starts:

    There are two main theories at present. The first is that the event is initiated by the reflection from the western boundary of the Pacific of an oceanic Rossby wave (a type of low-frequency planetary wave that moves only west). The reflected wave is supposed to lower the thermocline in the west-central Pacific and thereby warm the SST sea surface temperature by reducing the efficiency of upwelling to cool the surface. Then that makes winds blow towards the (slightly) warmer water and really start the event. The nice part about this theory is that the Rossby waves can be observed for months before the reflection, which implies that El Niño is predictable.

    The other idea is that the trigger is essentially random. The tropical convection (organized large-scale thunderstorm activity) in the rising air tends to occur in bursts that last for about a month, and these bursts propagate out of the Indian Ocean (known as the Madden-Julian Oscillation). Since the storms are geostrophic (rotating according to the turning of the earth, which means they rotate clockwise in the southern hemisphere and counter-clockwise in the north), storm winds on the equator always blow towards the east. If the storms are strong enough, or last long enough, then those eastward winds may be enought to start the sloshing. But specific Madden-Julian Oscillation events are not predictable much in advance (just as specific weather events are not predictable in advance), and so to the extent that this is the main element, then El Niño will not be predictable.

    Given the complexity here, it's not surprising that people try to use PCA algorithms to analyze piles of data to find and study teleconnections.

    Comment Source:Qutar or Qatar? Either way, what are you doing there, Jake? > Can someone explain with more mathematical detail how how PCA is used in Mann’s work and similar paleoclimate reconstructions? (John, perhaps this is a topic for your graduate course!) Alas, I know nothing about that. All I know is that Richard Muller, now head of the BEST (Berkeley Earth Surface Temperature) project, questioned Mann's use of PCA. [He wrote](http://www.technologyreview.com/news/403256/global-warming-bombshell/): > Now comes the real shocker. This improper normalization procedure tends to emphasize any data that do have the hockey stick shape, and to suppress all data that do not. To demonstrate this effect, McIntyre and McKitrick created some meaningless test data that had, on average, no trends. This method of generating random data is called “Monte Carlo” analysis, after the famous casino, and it is widely used in statistical analysis to test procedures. When McIntyre and McKitrick fed these random data into the Mann procedure, out popped a hockey stick shape! > That discovery hit me like a bombshell, and I suspect it is having the same effect on many others. Suddenly the hockey stick, the poster-child of the global warming community, turns out to be an artifact of poor mathematics. How could it happen? Let me digress into a short technical discussion of how this incredible error took place. > In PCA and similar techniques, each of the (in this case, typically 70) different data sets have their averages subtracted (so they have a mean of zero), and then are multiplied by a number to make their average variation around that mean to be equal to one; in technical jargon, we say that each data set is normalized to zero mean and unit variance. In standard PCA, each data set is normalized over its complete data period; for key climate data sets that Mann used to create his hockey stick graph, this was the interval 1400-1980. But the computer program Mann used did not do that. Instead, it forced each data set to have zero mean for the time period 1902-1980, and to match the historical records for this interval. This is the time when the historical temperature is well known, so this procedure does guarantee the most accurate temperature scale. But it completely screws up PCA. PCA is mostly concerned with the data sets that have high variance, and the Mann normalization procedure tends to give very high variance to any data set with a hockey stick shape. (Such data sets have zero mean only over the 1902-1980 period, not over the longer 1400-1980 period.) > The net result: the principal component will have a hockey stick shape even if most of the data do not. Of course after doing a lot more work Muller now agrees with one of Mann's key conclusions: the Earth is warming up. But I've never read a good discussion of the methodological point raised by Muller. In the online conversations I've seen, everyone is too passionate and argumentative for me to trust them: people who believe in global warming defend Mann, and people who don't attack him. I know a microscopic amount about a related but different topic: the use of principal component analysis to study '[teleconnections](http://en.wikipedia.org/wiki/Teleconnection)' in the Earth's weather. The idea of a teleconnection is that the weather at one place and time on Earth may be strongly correlated to the weather at a remote place and perhaps a shifted time. Teleconnections are quite fascinating. There are lots of them, but the most famous is the El Niño/La Niña cycle, more technically called the [ENSO](http://www.azimuthproject.org/azimuth/show/ENSO). William Kessler mentions two theories of how an El Niño starts: > There are two main theories at present. The first is that the event is initiated by the reflection from the western boundary of the Pacific of an oceanic Rossby wave (a type of low-frequency planetary wave that moves only west). The reflected wave is supposed to lower the thermocline in the west-central Pacific and thereby warm the SST sea surface temperature by reducing the efficiency of upwelling to cool the surface. Then that makes winds blow towards the (slightly) warmer water and really start the event. The nice part about this theory is that the Rossby waves can be observed for months before the reflection, which implies that El Niño is predictable. > The other idea is that the trigger is essentially random. The tropical convection (organized large-scale thunderstorm activity) in the rising air tends to occur in bursts that last for about a month, and these bursts propagate out of the Indian Ocean (known as the Madden-Julian Oscillation). Since the storms are geostrophic (rotating according to the turning of the earth, which means they rotate clockwise in the southern hemisphere and counter-clockwise in the north), storm winds on the equator always blow towards the east. If the storms are strong enough, or last long enough, then those eastward winds may be enought to start the sloshing. But specific Madden-Julian Oscillation events are not predictable much in advance (just as specific weather events are not predictable in advance), and so to the extent that this is the main element, then El Niño will not be predictable. Given the complexity here, it's not surprising that people try to use PCA algorithms to analyze piles of data to find and study teleconnections.
  • 4.

    Hi - Qatar has plans to get involved in some "quantum stuff".

    Comment Source:Hi - Qatar has plans to get involved in some "quantum stuff".
  • 5.

    Neat! Since they're the only country whose name starts with Q, I guess they have an advantage.

    Did you see the desert, or just stay in town?

    Comment Source:Neat! Since they're the only country whose name starts with Q, I guess they have an advantage. Did you see the desert, or just stay in town?
Sign In or Register to comment.