loganr.io - End is Nigh

The End is Nigh (or Not): How to Lie with Graphs

11/3/22

It's tough to go a day browsing investment 'news' sites, or especially investing fora, without seeing graphs comparing the similarities between current market movements and those market movements immediately proceeding big scary crashes. Here's the genre of graph I'm referring to - found readily available on those aforementioned investing sites:

2022 and 1937:

https://twitter.com/MichaelMOTTCM/status/1582530467450195968/photo/1

...or 2022 and 2000:

The S&P 500 May Be Near The Most Dangerous Phase Of The Bear Market -

https://seekingalpha.com/article/4531046-sp-500-near-most-dangerous-phase-of-bear-market

...or 2022 and 2018:

For the S&P 500, it's looking a lot like 2018 - https://www.reuters.com/markets/asia/live-markets-sp-500-its-looking-lot-like-2018-2022-02-23/

...or 2015 and 1937?:

Scary chart for the stock market bulls..de ja vu 1937? - https://twitter.com/RonStoeferle/status/633619679806926848/photo/1

...or 2018 and 1937?:

...Is It Time for a 1937-Style Collapse? - https://www.barrons.com/articles/stock-market-surge-chart-forecast-51552925879

I think you get the picture. There's no shortage of graphs drawing visual parallels between year 20XX and year XXXX. Plugging one of the above graphs into a google image search and scrolling through related images will turn up many more. They've accurately predicted thousands of the last two market crashes, but more importantly, they're guaranteed to draw attention. Attention that may lead to social media followers, engagement that draws in advertising dollars, or most heinously, investor dollars for management. I'm certainly not the first to point out the bad statistics these malicious graphs use to play with your emotions (see This Infamous Stock Market Crash Chart Just Won't Die or Lying with Charts), so I won't take this post in that direction. Rather, let's play with the data ourselves, and use Python to generate some bad graphs to show whatever we want to communicate.

What are the rules here? How far should we look back, and how far into the future should we show? What are we even measuring? It's the wild west.

My first thought was looking at the all the day/day price changes for some arbitrary period looking backward from today, then finding the correlation between those daily changes and daily changes of all other (same size) arbitrary periods of the past. High correlation coefficient = similar graphs. This was not the right approach. The problem is that day-to-day change can be very highly correlated while still having very different looking graphs. For example, the daily changes of the two example series below are perfectly correlated; a linear shift in daily returns can produce visually unrelated price charts:

Silly mistake, waste of time. What we really want to find are highly correlated total return series, not correlated daily return series. For the example above, while the daily returns are perfectly correlated, the total returns are not (0.592 correlation coefficient):

Let's get to coding, now that we have our objective figured out. Import necessary libraries and a spreadsheet of daily S&P 500 close data (S&P 500 data retreived from Yahoo Finance, starting 12/30/1927):

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from matplotlib import gridspec

from matplotlib import ticker

from datetime import datetime

data = pd.read_excel('S&P 500 Daily Data.xlsx')

"data" variable

Next we have to decide our "look back" period. Ending today, how many days backwards do we want to look when doing our regression searching and plotting? It's totally arbitrary, but I chose 500 days or roughly 2 years. We then loop over the entire time series, calculating the correlation between the total returns of the previous 500 days and the total returns of every other 500 day period. The correlation coefficient values are calculated and stored in a new column. We create a new dataframe 'sort' so we can sort the results by correlation coefficient without messing with our main dataframe:

lookback = 500 #days

data['correl'] = 0 #add a new column to the dataframe to store correlation values we calculate

curr = data.iloc[-lookback:]['close'].divide(data.iloc[-lookback]['close'])-1

i = 0

while i < len(data)-2*lookback:

past = data.iloc[i:i+lookback]['close'].divide(data.iloc[i]['close'])-1

data.iloc[i,2] = np.corrcoef(curr,past)[0,1]

i += 1

sort = data.sort_values('correl', ascending=False)

"sort" variable

Now we scan through the most correlated time periods, picking out dates that look interesting or support the message that we're trying to convey. "target" will be the index value that we want to look at. On the above image, the "target" value refering to the first row (dated 1980-03-13) would be "13072". We plug that into the code, and decide on how many days to look forward (make sure to include any scary crashes), and plot:

target = 13072

lookfwd = 400

#GRAB DATA

a = data.iloc[-lookback:]['close'].divide(data.iloc[-lookback]['close'])-1

b = data.iloc[target:target+lookback+lookfwd]['close'].divide(data.iloc[target]['close'])-1

a0 = "{:%m/%d/%Y}".format(data.iloc[-lookback]['date'])

a1 = "{:%m/%d/%Y}".format(data.iloc[-1]['date'])

b0 = "{:%m/%d/%Y}".format(data.iloc[target]['date'])

b1 = "{:%m/%d/%Y}".format(data.iloc[target+lookback]['date'])

lega = "{:%Y}".format(data.iloc[-1]['date'])

legb = "{:%Y}".format(data.iloc[target+lookback]['date'])

c = data.iloc[target]['correl']

#GRAPHING

#set up subplots

fig=plt.figure(figsize=(12,4),dpi=300,tight_layout=True,frameon=True)

ax1=fig.add_subplot(gridspec.GridSpec(1, 2, width_ratios=[2, 1])[0] )

ax3 = fig.add_subplot(gridspec.GridSpec(1, 2, width_ratios=[2, 1])[1])

ax2 = ax1.twinx()

#style

plt.rcParams['axes.facecolor'] = 'black'

plt.rcParams['axes.edgecolor'] = 'white'

plt.rcParams['axes.labelcolor'] = 'white'

plt.rcParams['axes.grid'] = 'true'

plt.rcParams['axes.grid.axis'] = 'both'

plt.rcParams['figure.facecolor'] = 'black'

plt.rcParams['ytick.color'] = 'white'

plt.rcParams['xtick.color'] = 'white'

plt.rcParams['grid.linestyle'] = ':'

plt.rcParams['grid.linewidth'] = '0.5'

plt.rcParams['font.monospace'] = 'Computer Modern Typewriter'

#plot data

ax1.plot(np.arange(-lookback,lookfwd),b,color='#FF0000',lw=1) #'#FF0000'

ax2.plot(np.arange(-lookback,0),a,color='white',lw=1)

ax3.scatter(a,b.iloc[:-lookfwd],s=1,color='blue')

ax3.plot(np.unique(a),

np.poly1d(np.polyfit(a,b.iloc[:-lookfwd],1))(np.unique(a)),

color='white',lw=1)

#legend

fig.legend([legb, lega], loc=2, bbox_to_anchor=(0.1, 0.95))

#label text

ax1.set_xlabel("Days from {}, {}".format(a1,b1));

ax1.set_ylabel("S&P 500 Return, starting {}".format(b0))

ax2.set_ylabel("S&P 500 Return, starting {}".format(a0))

ax3.set_xlabel("Correlation: {}".format(c));

#y-axis formatting

ax1.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=1,decimals=0))

ax2.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=1,decimals=0));

#y-axis limits, comment out for auto-fit

#ax1.set_xlim([-250, 500]);

ax1.set_ylim([-0.1, 0.65]);

ax2.set_ylim([-0.1, 0.65]);

ax3.set_xlim([-0.5, 0.5]);

ax3.set_ylim([-0.5, 0.5]);

The output:

Hey, what gives? That graph isn't implying doom and gloom. Let's try another:

Hmmm... another recovery. 1947?:

Inconclusive. Down the list there's a 2008 - surely that'll get us a nice graph!:

Finally! Now we can write our "Is there another 2008 right around the corner?" article and get lots of clicks. Here's the 1937 example that was shown at the very top of this post:

The point is that we can make the graphs say just about anything we want. There are 23,825 trading days in the dataset I'm using - that's a lot of data to draw upon. For any arbitrary time period, I can find many other time periods that roughly visually match, especially if I'm stretching the axis and carefully choosing the time periods I show to make the graphs more convincing. I urge you to scroll back to the top of the post and look at all the manipulative tweaks present in those example graphs to make the two lines look similar. Arbitrary look-back time, look-forward time, y-axis scales, x-axis scales... I mean, what's the point? It's all pseudo-scientific garbage.

Moral of the story: "technical analysis" - financial astrology, the art of drawing lines and shapes or otherwise manipulating graphs to create patterns - is no good at predicting the future. But I don't want to belittle anyone for falling into the trap of believing that you can predict the future by spotting patterns on a graph, because it's a very easy trap to fall into. There is a gargantuan amount of historical financial data out there - we've thus far only looked at one index, but imagine the enormity of daily (or more granular) price data for every stock, bond, currency, commodity, or whatever, that has ever existed.

By sheer size of financial data alone, there exist countless examples of any arbitrary pattern showing up over and over again. It's not hard to find evidence of repeating patterns, obvious in hindsight, but it is hard to spot the absence of a pattern. While watching a graph in real-time, without the benefit of hindsight, if a pattern looks to be implying something, and markets do something other than what's expected, then the pattern simply didn't exist in the first place. Maybe it was actually a different pattern, obvious now with new data, or maybe it was just noise to brush under the rug. If markets did go on to behave the way the initially hypothesized pattern implied, then the pattern was there all along. It's really hard to recognize and recall when patterns don't work.

I've shown several graphs that correlate highly with recent history that imply both bearish and bullish cases for the future... can you see how one of them will prove prophetic, while the others will be forgotten?

Appendix:

An example with coin flips - I randomly generated 20 coin flips and observed the following results:

H-T-H-T-T-H-T-H-H-T-H-H-H-H-T-T-H-T-T-H

We can identify patterns of behavior that show up several times (with only 20 data points, much less tens of thousands!):

A: T-H-T (4 occurrences)

B: T-H-H (3 occurrences)

C: H-T-T-H (3 occurrences)

D: T-T-H-T (2 occurrences)

E: H-T-H-H (2 occurrences)

F: H-T-T-H-T (2 occurrences)

If we fall into the trap of trying to use historical patterns to predict the future (the next coin flip), we have several choices. If we want to look at the last two flips, Tails then Heads, we could use pattern A to convince ourselves that Tails is coming up next. We could similarly predict Heads is next, evident by pattern B. We could also look at the last three flips, Tails-Tails-Heads, and use pattern D to predict Tails. Remember, the look-back period is totally arbitrary. Regardless of whether we use our patterns to predict Heads or Tails next, the coin doesn't care. One pattern will be "right" and celebrated, others will be "wrong" and forgotten, none of them have any value.

Here's another post about how difficult prediction is, this time using fundemental data rather than historical price data: https://www.loganr.io/blog/pendulums

Page updated

Google Sites

Report abuse