Ok, so I’ll warn you up front: this one’s going to be a little bit difficult. The response to my post on the 200 day moving average in the DJIA and the S&P 500 was very positive, and I received many thought-provoking questions and requests for more detail. I thought the best way to answer those questions might just be to share some work I have done previously. I will try to summarize the important points and find a balance between writing an easily digestible blog post and giving enough detail. Here we go.

### Testing moving average breaks

There are many ways to structure tests of moving averages. I did quite a few, but the one I want to share with you today I called the “Moving average penetration (Fade Break)” test. For a buy, the criteria are:

- Yesterday’s low was above the moving average
- Today’s close is below the moving average
- Buy on today’s close

You can see, the concept is that we are fading a close through the moving average. A visual may be helpful:

[Let me continue with some text from the unpublished portion of my book. (I already did the work once, so it makes sense to share it in its original form.) I’ll summarize at the end of this post.]

What happens after price breaks through a moving average? If moving averages are, in fact, important support or resistance levels, if large traders are making trading decisions based on the relationship of price to the average, we should see some reaction after the moving average fails to contain prices. It would be reasonable to assume that traders will exit or adjust positions on the break of the average, and this buying or selling pressure should cause distortions in the returns. We call this test the Moving Average Penetration test.

This set of conditions would have the trader always *fading*, or going against, price movements through a moving average: if price breaks below a moving average after being above it, this rule set will generate a buy signal. It is entirely possible that this is backwards, and perhaps these should be traded as breakouts by going with the direction of the price movement. Again, it does not matter; if the criteria are flipped for buy and sell signals, we will simply see negative excess returns for buys and positive for sells.

### Fade test results

These results appear to be interesting, at least for the equities sample. The sell signals (which, remember, are based on *shorting* the first bar that closes *above* a moving average) show a consistent negative edge, and this edge is statistically significant. The buy signals also show an interesting pattern, but it is not as clear or as strong. The buys (again, this is *buying *the first bar that closes *below *a moving average) show an initial small positive edge that appears to decay into a negative edge between 5 to 10 days from the signal. This decay of a positive signal into a statistically significant sell signal may be a bit surprising; to better understand the dynamics involved we should ask if it could be due to the effect of a large outlier. Though the data is not reproduced in these tables, this effect *does not *seem to be attributable to a single outlier; when the equities universe is split into large-cap, mid-cap, and small-cap samples, the same signal decay is apparent in all market capitalization slices. If this were due to an aberration in a single stock, the decay would most likely be limited to a single market cap. It is also interesting to note that, while we have interesting patterns in equities, the futures and forex groups do not show any predictable pattern. This is the strongest clue we have had so far in these tests that perhaps not all assets trade the same from a quantitative perspective. If we continue to see evidence that assets behave differently, this would seem to present a significant challenge to the claims that all technical tools can be applied to any market or time frame with no adaptation.

### The 200, 100, and 50 day averages are not special

Results from other tests, though not reproduced here, look very similar regardless of period (from 10 to 200) or type (exponential or simple) of moving average used in the test—the curious distortion in equity returns persists. Also, running the test on the random walk period moving average, not surprisingly, generates similar results. This might be a good place to pause and to think about what is going here. Based on these tests, we see absolutely no evidence validating moving averages as important levels. In the data and the results, we cannot distinguish between the different periods of moving averages: 20, 45, 50, 65, 150, 185, 200, 233, and any others basically all look the same. However, there is an unusual pattern in the Moving Average Penetration tests that warrants deeper investigation. Regardless of what moving average is used, there appears to be a statistically significant edge, at least in equities, for buying closes below and shorting closes above the moving average. Here is a radical thought: what happens if we repeat this test *without* the moving average?

### No, it’s not the averages. It’s mean reversion.

Yes, a test of moving averages without the average. Before you decide I have gone completely insane, consider the criteria for this Moving Average Penetration test. For a buy, price has to close below the average, and the previous bar’s low had to be above the average. In almost all cases, this means that *the entry bar’s close is below yesterday’s low*. Sure, it is possible that, in a few rare cases, the moving average could actually have risen enough that it is above yesterday’s low, but this is unlikely. It is far more likely that a close below the moving average is also a close below yesterday’s low. Figure 16.17 shows a graphical examples of fading a close outside the previous day’s range, and Table 16.17 presents summary statistics for this test.

Now we are getting somewhere, and this is important, so make sure you understand this next point: First of all, these results look remarkably similar to the moving average breaks, at least for the first five days: Equities show a fairly large and statistically significant negative return after the sell condition. Equities also show a much smaller, but still significant, positive return following the buy condition. **Though this is not conclusive evidence, it strongly suggests that the observed statistical edge around the moving average is simply a function of stocks’ tendency to reverse after a close outside of the previous day’s range.** This is an expression of mean reversion, which is one of the verifiable, fundamental aspects of price movement.

It is also worth considering that what you see in Table 16.17 is significant on another level as well—these results strongly suggest that equities do not follow a random walk. Random walk markets would not show this anomaly. (Though the results are not presented here, in general, deviations of less than 2 basis points were seen from the baseline when this test was reproduced on random walk markets.) This is an extremely simple test with one criterion that produces a result that raises a serious challenge to one of the accepted academic hypotheses. We can say, based on this sample of 600 stocks over the past 10 years, that we find sufficient evidence to reject the random walk hypothesis for equities.

We’re not done yet, however. The situation for futures and forex is a bit more complicated. On one hand, there is a measurable difference in the proportion of positive closes on the first day after the signal. The Futures baseline closes up 50.6 percent of the time, compared to 52.6 percent and 47.9 percent for the buy and sell signals, respectively, and the forex baseline closes up 51.0 percent, compared to 54.2 percent and 47.5 percent for buy and sell signals. These differences are statistically significant, and could potentially give an edge in some situations. However, we have to note that the magnitude of the signal, in terms of deviation from the baseline, is very, very small. This is certainly too small to be *economically significant* on its own, but perhaps could be a head start when combined with some other factors. This is something we are going to see again and again in quantitative tests: futures and forex consistently tend to more closely approximate random walks than equities.

### Conclusions and further lessons

This is just one test of many that I ran when I was looking at moving averages. You can play with this many different ways: what happens when the average breaks (as it has here), or holds? Does it matter if price moved a certain distance from the average? (Yes.) Do different periods or types of moving averages make a difference? (No.) And the questions go on.

As an interesting aside, I think I learned something new since doing this work a few years ago. I couldn’t quite understand the decay of the buy signal into a sell, or the strength of the overall effect of the sell. I think the reason is this: stocks tend to have stronger returns following closes in the top and bottom deciles of their range–that’s where a lot of the “juice” is. In addition, this sample is biased in that it did not have any companies that delisted or went bankrupt; the baseline adjustment methodology is one way to compensate for that bias, but we still may be picking up some asymmetry that biases the baseline too strongly positive. (This is speculation, but I spend a lot of time trying to shoot holes in my own tests!) At any rate, moving average events typically cluster somewhere closer to the median, so we are naturally picking up a lower set of returns in these events. More testing is necessary, but I think that’s a promising direction.

At any rate, there are a few important lessons here:

- There appear to be no special moving averages (100, 200, etc.) in stocks, futures, or currencies.
- Price touching or crossing a moving average does not appear to be a tradable event.
- We observe a small effect in stocks when a moving average breaks, but this effect is explainable through mean reversion.
- We have also seen evidence that not all asset classes trade the same. Again, this calls into question claims that any technical tool can be used on any asset or timeframe.

I apologize for the length of this post, but I received numerous requests for this information. If the information here is overwhelming, at least read the bullet point conclusions above a few times–those are critical, objective lessons that all technical traders should consider.

Your buy criterion is too restrictive. What happens if you by at the close of the day when the price moves above the moving average on the previous day?

The difference between the price and the moving average is some sort of smoothed approximation to the derivative of the price, and so the methods based on it are just trend following methods. It of course may not work for all cases.

In backtests it works very well, e.g. for VFINX (Vanguard SP500) if you also hold, say, VUSTX, when not holding VFINX.

1992:2014

5 month (100 days) based :12.4% CAGR

10 months (200 days) based: 12.7% CAGR

8 months (160 days) based: 14% CAGR

I agree that there is nothing special about any particular number of days.

Some good points here, but I’m not sure my buy criterion was too restrictive… it was simply “buy when it closes under the MA”. I’m not sure I understand exactly what you’re proposing as an alternative, but I ran nearly a dozen different test structures… and you see very similar effects.

Holding stock indexes above a long term MA does have some validity in some contexts, so I think the backtest you present probably does make good sense.

Thank you!

Thanks for this great post, Adam!

I have especially marked this point:

“We have also seen evidence that not all asset classes trade the same. Again, this calls into question claims that any technical tool can be used on any asset or timeframe.”

As far as I have seen, futures and forex are more momentum driven on longer timeframes. Might this also be a reason why they show such a tiny effect on this mean reversion test?

Cheers,

Markus

perhaps and I think in general they show better momentum and weaker mean reversion. This may be why they take a significantly different mindset than stocks.

Thanks again, for sharing your analyses and thoughts. I largely agree with what you wrote. Two minor points/questions:

1. There are many cases where the mean and median difference differ considerably, which often can be a hint to large outliers in the data. Did you consider a non-parametric test, to test for the median differences?

2. I think you are probably right in saying that the effect relates to mean reversion, or at least it seems more likely. I’d like to point out, however, that just showing significant effects associated with “fading closes outside the day’s range”, by no means proofs mean reversion as being causal for the observed effect, but first and foremost just show the association. That is, since both, as you pointed out, the chosen MA is arbitrary and fading the MA is highly related to closing outside previous lows/highs, in theory the market could trade according to fading MA and you probably would still obtain the same test results.

1. Yes. Those tables even have difference of medians. That’s something to consider and non-parametric tests often do make sense.

2. ok… valid point. My thinking was that I tested about a bazillion MA values and found them all the same then found mean reversion which I assumed explained the effect. I think that’s a valid assumption, but you’re correct… it is an assumption and does not necessarily mean that fully explains everything. Good point. Thanks!

In my view, mean reversion is also another abstract concept, and in this regard not much different from MA crossing or the like. That is, although showing a significant effect as described above, it in the end just *describes* the effect. So it would be really interesting to know, what is actually *causing* the effect, for example, a certain human behaviour (which obviously would be specific to the equity market) or some big players, or …

Regarding the equity-specific effect I’d also like to know: were the sample sizes for all three markets the same, or statistically speaking, did the t-tests for the three sectors all have the same statistical power?

Interesting, and I see the difference in perspective. I have always thought of mean reversion and momentum as kind of the root level tendencies and beyond that, so much is unknowable. Your interest in understanding the psychology that motivates these forces makes sense, and I spent some time thinking about it. I guess the reason I never really went there was 1) a sense that these motivations might actually be unknowable and 2) I didn’t need to go deeper because I found tradable tendencies at this level.

I’d argue that mean reversion is very different because it does not require a structured value to be added… we can simply see it in the return series (though we’ve obviously imposed some structure even to get to that point). I don’t think you’re wrong in what you say… just different perspective.

The sample sizes and effects were basically the same, but I think there are some issues… sample sizes are too big, universe is too correlated, so p values are probably unrealistically low in general. That’s my latest thinking, but the short answer to your question is yes.

Actually, I’d totally agree with you that if you find a tradeable edge using a certain model, the degree of abstraction of that model is of no further importance. I was just hinting at this, because your text basically appeared to me as stating “Averages (spuriously) show the effect while mean reversion is causing the effect” whereas I think mean reversion is just a better model to explain the tendencies. I hope this makes sense 🙂

Correlation within the equity universe indeed could be a problem, but I was actually not concerned about the significant p-values 🙂 I had rather the non-significant results of futures and forex in mind, because due to the definition of the p-value, non-significant results are no proof of the null hypothesis per se. At some passages in your text, however, the reader might get that impression of a “non-significance proof” . So basically I was thinking, well, maybe futures show the same effect but the sample size was just too small to show it. I’d generally suggest confidence intervals instead of p-values. They provide more information and often allow for a better overall interpretation of the effect.

Pingback: Random Line Theory (Page 25) - Traders Hideout | Big Mike Trading

Pingback: The Whole Street’s Daily Wrap for 10/21/2014 | The Whole Street