In November a curious paper dropped on SSRN, an online hub widely used by economists to circulate their work. The paper’s authors are two of the most important figures in academic finance: Kenneth French, of Dartmouth College’s Tuck School of Business, and Eugene Fama, a Nobel Prize-winning economist from the University of Chicago. The article itself seems modest, reading like a technical footnote to research the duo published 30 years ago, about stock market returns.

Without context, “Production of U.S. Rm-Rf, SMB, and HML in the Fama-French Data Library” is a bit mystifying. The paper seems to be answering questions about Fama and French’s data—yet it doesn’t spell out what the questions are or who’s asking them. To understand its 18 pages, one might need a Ph.D. in finance. To read the subtext, you’d have had to know a few dozen people with one. The paper is the culmination of a quiet but sharp-elbowed debate in a corner of academic economics about the reliability of a dataset that’s crucial not only to professors and grad students but also to professional traders, investors, securities litigators and corporate executives.

Two days before the paper hit SSRN, a PDF of it landed in the email inboxes of some of the country’s top economists and legal scholars. The accompanying message from Fama made it clear why he and French had written the article—and also that he was annoyed. “Most of you are in the acknowledgements of two papers,” he wrote. “There is lots of strong language in those papers about the effects of updates in the Fama-French factors. ... Our view is that their results are not surprising for those with experience in asset pricing research.” Fama added: “For us, the whole experience is a great example of the old saw: No good deed goes unpunished.”

The stakes of even the smallest argument over Fama and French’s research are high. Their work is about nothing less than how we know what companies are worth and how well money managers are doing their jobs—and how to make money in the market. For example, they found that companies with certain characteristics, such as a small size or a relatively cheap share price, tend to perform better in the long run. These special features are known as “factors.”

Trillions of dollars of assets ride on strategies informed by Fama and French’s research. Factor-based index funds and exchange-traded funds allow investors to profit by exploiting their academic findings. Those include funds offered by Dimensional Fund Advisors, a $677 billion investment firm founded by a former student of Fama’s. Both Fama and French serve as directors and consultants.

French provides their datasets for free on his Dartmouth homepage—the good deed that allows others to use and test them, and which magnified the influence of these models. For every month going back to 1926, anyone can look up the market’s return as well as the gains or losses of portfolios sorted by factors.

In 2021, three professors then based at the University of Toronto noticed something strange and potentially unsettling. The numbers were “noisy”—that is, they changed in significant ways depending on when they were downloaded from the site. Pat Akey, Adriana Robertson and Mikhail Simutin wrote up their puzzling findings in two working papers—the ones Fama cited in his email. The authors noted that changes in the numbers seemed to improve the historical record of the value investment strategy—that is, of buying cheap stocks—and wrote that a “lack of transparency” made it impossible to know what to make of that.

Fama and French’s November article confirms that the numbers were changing and lays out the reasons. Among them: Corrections in historical stock market records and adjustments related to accounting rule changes that affected how some stocks are categorized. French’s Dartmouth website now contains an archive of earlier versions of the data, so researchers can compare them.

That’s science: The Toronto researchers pointed out something odd, and Fama and French answered with more data and information. But the story of how the exchange happened is revealing. About the hierarchical world of academic finance, where journal articles can become the basis of multibillion-dollar trading strategies. And about the inherent messiness of the numbers that shape investors’ understanding of markets.

Knowledge has a factory floor. The process of coming to know anything about financial markets and the economy occurs not only on trading desks but also at universities, in a collaborative global assembly line. Someone spots an idea in the market’s flood of numbers. They send it to production to test it, shape it, convert it into research. Over time it gets sent out for broad consumption, and the world knows a little bit more and acts on what it knows.

This process relies on a few assumptions: that it’s possible to know anything solid about something as ever-shifting as the markets; that it’s useful to build models that can only approximate reality; that each piece of research is a starting point from which we can learn more. The debate over factor data began with researchers trying to learn more, specifically about indexes. Well into the Covid-19 pandemic, Akey, Robertson and Simutin were working together in the great seminar room of that time—a Zoom conference. They were updating a paper that used the Fama-French data library.

As part of this process, they re-ran the code from their draft—basically, they refreshed the page. When Simutin pushed the button, the numbers changed. “What made it super, super weird was that only half of the numbers changed,” Robertson says. Just the numbers generated from the Fama-French dataset of stock returns.

Simutin remembered he’d had this problem before, when revising a paper while getting his Ph.D. in finance at the University of British Columbia. He opened the Excel file with the raw data and made a simple line chart comparing the numbers, old minus new. In a perfect world, the line would have been flat. “But, instead, it was wobbling around,” says Robertson, who teaches business law and holds both a law degree and doctorate in finance from Yale. “And then the rabbit hole opened, and down we went.”

Like many modern rabbit holes, this one plunged through the deepest depths of the web. The team went to a site called the Internet Archive to see if its Wayback Machine—a repository of old webpages—had happened to grab previous versions of the Fama-French dataset. It had: They were able to put together multiple sets of numbers, going back to 2005. They called each one a “vintage” of the data.

Naturally, the numbers changed as time passed and new months were added to the library. But that wasn’t the issue. Data for the same months changed from version to version. “Your historical data is retroactively changed,” says Robertson. “It would just never occur to you that the equivalent of the temperature on Jan. 27, 1989, depends on when you downloaded, if you checked this year or last year. Which is the equivalent of what we were finding.”

The website had disclosures acknowledging some revisions, such as in the underlying data from providers Compustat and the Center for Research in Security Prices (CRSP), an affiliate of the University of Chicago that tracks prices and creates market indexes licensed by fund companies including Vanguard Group. (Bloomberg LP, which owns Bloomberg News, sells data and indexes, including factor-based indexes, to institutional clients.) CRSP had undertaken a big project updating its data, and otherwise fixes errors as it finds them, merging data, tidying up the number of shares outstanding. The team found these operations explained only part of the changes. French’s site also noted adjustments related to accounting rule changes, without quantifying the impact.

Models Baked In To Finance
The differences in return are measured in basis points—or hundredths of a percentage point—per month. But Akey, Robertson and Simutin write in their main paper, “Noisy Factors,” that they are pervasive and add up. For example, 77% of the monthly numbers for the model’s value portfolio differ by more than 1% annualized between the 2005 and 2022 vintages, according to a version of their paper updated last year and posted on SSRN. And that’s worrying, they say, because Fama and French’s models are so baked into modern finance.

Scholars have cited Fama and French’s 1993 paper at least 35,000 times. It was a major achievement in a quantitative-finance revolution that began four decades earlier with a University of Chicago Ph.D. candidate named Harry Markowitz. Up to that point, stock-picking was a largely artisanal affair, based on a trader’s alleged skill and good judgment. Markowitz proposed another way: Instead of trying to predict and pick winners, one should diversify. Pick several different companies whose risks balance each other out, and you can use math to create a portfolio that’s more stable and still delivers good returns. Markowitz was awarded the Nobel Prize in economics in 1990.

More economists, including John Lintner and future Nobel laureate William Sharpe, built on Markowitz’s insight. Instead of looking at how to make a portfolio better in isolation, they considered how it did relative to the overall market. Some stocks reliably take off whenever the market goes up and tank when it goes down, while others are less responsive to the market’s gyrations. You could build your portfolio around whether you wanted it to be less volatile than the overall market, or more volatile in exchange for a higher potential return. “And for the longest time, financial academics thought that there was this one big factor, which is the aggregate stock market, and that’s how everything was measured. It was against this one factor,” says Andrew Lo, a professor of finance at MIT Sloan School of Management who researches asset pricing. “And then along come Fama and French.”

Fama had laid the groundwork in the 1960s with a different blockbuster insight: Markets are efficient. Traders quickly “price in” the available information about a stock, whether it’s news about slipping earnings or the success of a product or a hurricane bound for the company’s main factory. As a result, it’s very hard for even professionals to beat the market, especially after fees. Fama’s efficient-markets hypothesis is a big reason so many investors own index funds today. It’s also why he won a Nobel Prize in 2013.

But in trying to explain why some stocks do better than others, Fama and French found a kind of loophole to that you-can’t-beat-the-market rule—or at least a modification that redefines what “the market” means. Certain types of stocks have given investors a little extra return. Namely, small company stocks and cheap value stocks, as measured by share price relative to what the companies’ assets are worth, also known as their book value. Now there were three factors that could explain a stock’s return.

That “three-factor model” largely shaped how professionals now talk about performance. It helps determine if a money manager is truly skilled. If you put your money in a mutual fund and it delivers a return of 15%, it might feel nice to be 15% richer, but you don’t know if the manager added anything of value. “In any time period, you can have a fantastic return because you took a lot of risk and the risk paid off,” says Akey, now a visiting professor at Insead, a business school in France. “Looking at raw returns is not the right metric. That’s luck.”

The Fama-French three-factor model enables more complexity. You can measure how a fund did relative to the market risk it took—did the market also go up 15% that year?—and relative to its exposure to small-cap and value stocks. After you strip out those factors, you might discover that 15% was kind of disappointing.

The factors have sprouted other uses. In a second paper, “Noisy Factors in Law,” the authors look at how the data might be used in litigation. Say a company is being sued by shareholders who allege that managers made a mistake that caused the stock price to go down. Factors can provide a finer-grained estimate of how the stock would have performed without the mistake.

And of course the factors point to a way to make money: Instead of buying an S&P 500 index fund, you could invest in a diversified small-cap or value fund in hopes of capturing a higher return. Or in an index-like portfolio that’s tilted toward smaller or cheaper stocks. These are the kind of funds Austin-based Dimensional specializes in. David Booth co-founded the company in 1981 and invited his former teacher, Fama, to become a shareholder. The company today presents itself as a kind of mix of Vanguard and a think tank, with a roster of top academics associated with it. “They feel that they should just let us do our research, and if something comes out of that that they can use, then they will do it,” says Fama in a video on Dimensional’s website. In 2008, Booth donated $300 million to Chicago’s business program—and his mentor’s academic home is now known as the Booth School of Business.

So factors are a very big business and a very big deal. But how important are the data discrepancies the Toronto group found for investors? You might care if you are a mutual fund manager—whether or not you outperformed could change depending on the vintage of the data you’re measured against. But for investors the obvious question is whether the factors still hold up, no matter when the data are pulled.

The Toronto authors found that the factors were there—it’s a question of size. The most visible change was to the value factor. Consider a hypothetical portfolio that bet $10,000 on value stocks starting in 1926 while betting against growth stocks. From 1926 to 2005, it would have gained an average of 0.41% per month, if you looked at the 2005 vintage of the data. The $10,000 portfolio would have grown to about $250,000 in 2005. In the 2022 version of numbers, however, the value edge grows to 0.45%. Now the portfolio would have grown to about $400,000 over the same period.

Moving Numbers
The moving numbers weren’t the only thing the “Noisy Factors” paper noted. The underlying source code of French’s website for the data library—at mba.tuck.dartmouth.edu—says: “Site Developed by Dimensional Fund Advisors Web Team.” It’s well-known that Fama and French work with Dimensional, and it’s disclosed on French’s homepage. Some academics, especially those who work or consult with fund companies, say they always assumed Dimensional did the data-crunching. But the relationship between the company and the factor data wasn’t clear, the Toronto group wrote.

Fama and French spell out the relationship in more detail in their latest paper, which is now linked at the top of French’s data page. It says Dimensional employees produce and post the monthly updates under Fama and French’s guidance.

The factor premiums are part of Dimensional’s pitch to clients. But value investing has been a bumpy ride in recent years. After the global financial crisis, a value investor was betting against the era’s big winners—the tech darlings that kept rallying despite rich valuations. According to French’s century-long data, 2020 was the worst year ever for value returns, and it followed three straight years of losses. More recently, in 2021 and 2022, value staged a comeback; even more recently, it has again underperformed.

The changes in the data between 2005 and 2022 make the historical value premium look larger than it used to. What to make of that? The “Noisy Factors” authors offer a rhetorical raised eyebrow about Dimensional’s relationship to the data so many academics rely on. “Rather than speculate,” they write, “we simply note that this lack of transparency, coupled with the pattern of changes to the factors, may be concerning to researchers who rely on the factors for empirical analysis.”

Both the lack of a specific conclusion and the subtext bothered some academics who read “Noisy Factors.” The authors say it worried them, too. “There’s always that thought—and I really do mean it—you’re like, ‘Maybe we need to let this go,’” Robertson says. “‘Maybe this is…’” she trails off. “But you can’t just let it go, because you need to understand what’s going on with your data.”

Robertson and her fellow authors may be constitutionally ill-suited to letting things go. Akey, a former competitive fencer who did his graduate work in finance at London Business School, says he enjoys solving puzzles: “I like synthesizing ideas, putting them together and seeing what maybe other people have missed.” Robertson is “more of a Lego person.” Speaking over Zoom, she says she has “four gigantic Lego sets in my living room right now. They’re all over 6,000 pieces.” Simutin has a black belt in kung fu but says he’s more focused on tennis—a hobby shared by Fama in his younger days.

Neither Fama nor French seemed eager to return the trio’s serve. To Fama, the “Noisy Factors” paper is about basically nothing. “I think you are wasting your time,” he said when asked about the changes, prior to his November paper with French. People try to mess with his work all the time; there’s a bias to make a big splash with research. Data get updated, so the factors change. “Academics are used to that and applaud the effort. It drives industry people crazy. Unfortunately, all databases are subject to a version of this problem.” French declined to comment, as did Taylor Smith, a spokesperson for Dimensional.

In the paper Fama ultimately published with French, they conclude that changes in the underlying data and their methodology since 2002 added up to improvement in the value premium of 0.03 percentage points, or 3 basis points, on average per month. That’s smaller than what the Toronto group found, but Fama and French also considered a different sample than the “Noisy Factors” group. Most of the change, according to Fama and French, was due to CRSP correcting data for the number of company shares outstanding prior to 1947, which affected their calculations. In September 2021, Fama and French also decided to stop using a proprietary method to connect data from Compustat and CRSP and began using standard links provided by CRSP. This had a partially countervailing effect, pulling down the value premium a bit. Fama and French also wrote that the changes in the data make the small-cap effect look weaker, by a little under 0.01 percentage points per month. Dimensional also runs funds that tilt to small companies—so, if anything, that change is to the firm’s disadvantage.

After “Noisy Factors” started circulating, some changes showed up on French’s site. It started publishing the old vintages—file after file of monthly and annual returns—going back to 2005. (The Toronto group updated the paper with this data and got similar results.) French “didn’t tell us, nobody told us, they just did it one day,” says Robertson, who has since taken a job at the University of Chicago’s law school, not far from Fama’s office. “I can’t prove intentionality, but the timing is interesting.”

Economic researchers and practitioners know their data are fragile. Tiny tweaks in inputs, parameters and samples can destabilize everything. Throughout the sciences, researchers are increasingly sensitive to the problem of data mining—that when you look hard enough at any big dataset, you are bound to find some number of patterns, even if many of them are just coincidences. The more researchers are looking, each one turning knobs and settings of their tests in different ways, the more coincidences will be found.

After the Fama and French discovery, the hunt for more factors became a kind of cottage industry. In 2019, Duke University’s Campbell Harvey and Purdue University’s Yan Liu published a paper called “A Census of the Factor Zoo,” in which they listed more than 400 factors discovered in top journals. “Surely, many of them are false,” they wrote. The sheer number of findings, they argued, implied that the data were being strip-mined to produce factors. Try to invest based on one, and you’ll likely find it doesn’t hold up. The paper noted that the value factor is still strong enough to survive their most stringent statistical tests.

The “Noisy Factors” authors don’t think their work revealed a false finding. But with that backdrop, it was bound to create some buzz. Other researchers set to work trying to figure out how worrisome the discrepancies were. At the Northern Finance Association’s conference in Banff, Alberta, Rice University’s Robert Dittmar cautioned against implying that Fama and French were doing anything fishy. He quoted Omar Little from The Wire: “You come at the king, you best not miss.” Since then, Dittmar’s seen the paper presented multiple times at academic-finance conferences, to the point where it feels a little like group therapy. “Almost all of us who work in this field have tried to re-create the Fama-French data that Ken posts on his website, and you get really close, but you’re never quite there,” he says.

Akey presented “Noisy Factors” at the Western Finance Association’s conference in San Francisco in June 2023. Afterward, Charlie Clarke, then at the University of Kentucky, offered a skeptical analysis in response. Clarke’s tests, using a different statistical measure, found that CRSP updates could largely explain the changes in the older half of the Fama-French data, and that adjustments related to changes in corporate accounting rules in the early 1990s were responsible for a huge proportion of the remaining changes. This is similar to the explanations Fama and French would later offer. “I don’t see anything that looks unsavory, untoward, and I think we can move forward on that. Or I hope we can,” Clarke says.

At the conference, Akey said that he appreciated Clarke’s careful work, and that they were only able to have this discussion about why the data might be changing because he and his co-authors had shown that it had. There were still unanswered questions: Even if reactions to two specific accounting rules could explain many changes, were there other perfectly reasonable adjustments Fama and French might have made that could have pushed the numbers a different way? “We don’t necessarily know what could have been,” Akey said. He referred to an idea in statistics known as the Garden of Forking Paths, which takes its name from a fantastical Jorge Luis Borges short story. There are many ways a well-intentioned researcher can choose to look at data—even without trying to find a preset answer—and it’s all too easy to make choices that seem to fit the information they have into a pattern.

Boston College visiting assistant finance professor Mathias Hasler has done his own work on Fama-French factors, unrelated to the research in Toronto. He likes to study when a phenomenon shows up in a paper’s sample—in a certain cut of the numbers—but evaporates or shrinks if you look at things from a slightly different angle. He dug into how Fama and French decided what constitutes value. “The decisions that they make in their paper to construct the value premium leads to an additional return,” he says, relative to what they might have gotten with other reasonable decisions. The alternative decisions he considered are seemingly picayune—for example, about timing which days to grab a slice of market data.

Hasler says his findings don’t mean that Fama and French were hunting for a good result. “They’re very smart, and they know about the dangers of data mining,” he says, adding that they have a reputation for the genuine pursuit of knowledge. “Another explanation is that maybe they simply went for a set of decisions, kind of assuming that these decisions do not matter, and then, maybe by just chance, they hit the decisions that led to large returns.”

Hasler’s article is forthcoming in the somewhat contrarian Critical Finance Review. Fama and French responded in a single page published on CFR’s website, saying it’s hard to evaluate his different definitions of value: “Is it good or bad news for our [value factor] that its average return is higher than those of most of his alternatives?”

Fama has said that the factor model isn’t perfect. “We use that word ‘model’ because it’s not reality—it’s an approximation, and it’s gonna have problems,” he told University of San Francisco finance professor Ludwig Chincarini in a 2021 interview on YouTube. “I’ve become kind of negative on factor models, because it just kind of opened a Pandora’s box.” People, he said, would hunt for factors and “try to not necessarily develop models but develop investment products they wanted to sell.” He acknowledged that the value premium seemed to have gotten lower in recent years. But he said performance is so volatile that it would take decades to conclude anything for sure.

Past performance, as the disclaimers on mutual fund ads say, is no guarantee of future results. Even if you could get perfect historical data, it only tells you history. Why did the small-cap and value factors work? One possibility is that something—perhaps irrational fear or a flaw in the structure of markets—caused investors to unjustifiably avoid these stocks, so that buyers got a bargain. If investors were irrational in the past, these effects might go away now that everyone has read Fama and French.

Another possibility is that the higher returns are the rational reward for some kind of risk. In that case, maybe the factors will stick around. “For over 50 years, academics have been trying to explain why one stock has a higher expected return than another,” says Duke’s Harvey. Researchers led by Fama and French have made a lot of progress. “However, we have only scratched the surface. The next generation of finance researchers has much work to do.”

The scrutiny on old factors underscores a change in academic research toward more data transparency. Some financial journals now require or publish research code alongside new papers. The most basic data and code were once the yield of hours and hours of manual and mental labor, pens and paper and columns and clumsy software. Even as technology advanced, Fama and French’s sharing of data was seen as generous, even radical. But expectations keep rising, in part because of better, faster computing, according to Michael Gallmeyer, a professor at the University of Virginia’s McIntire School of Commerce. “Nowadays you find a good Ph.D. student and set them down over a weekend, they might be able to replicate stuff people had done in a whole paper,” he says.

The “Noisy Factors” trio see Fama and French’s November paper as a partial answer to their questions, providing the “how” of the changes, if not exactly the “why.” At the very least, they wish that Fama and French had acknowledged “Noisy Factors” when they published their rejoinder. “It’s good academic practice to cite work that you’ve built on,” Simutin says.

Fama and French’s argument is that changes in complex data should come as no shock. At the end of their note, they point to the confounding problem at the heart of it all. “The details of factor construction are arguable, and there is no magic,” they write. “The appropriate caveat is: Use at your own risk.”

Childs is a co-host of Planet Money, NPR's economics podcast, and the author of The Bond King: How One Man Made a Market, Built an Empire, and Lost It All, from Flatiron Books. Lee reports on quantitative investing for Bloomberg News in London.

This article was provided by Bloomberg News.