The annual report, like other regulatory filings, is more than a legal requirement; it provides an opportunity for public companies to communicate their financial health, promote their culture and brand, and engage with a full spectrum of stakeholders. How readers process all this information affects their perception of, and hence participation in, the business in significant ways. More and more companies are realizing that the target audience for disclosures is no longer just human analysts and investors, but also robots and algorithms that recommend what shares to buy and sell after processing information with machine learning tools and natural language processing kits.
This development was probably inevitable, given technological progress and the sheer volume of disclosure materials. In any event, companies that wish to communicate and engage with stakeholders need to adjust how they talk about their finances and brands and make forecasts in the age of AI. That means heeding the logic and techniques underlying the language- and sentiment-analysis facilitated by large-scale machine-learning computation. An example of that sort of computation is a process that identifies positive, negative, and neutral opinions in, say, all disclosures by a company, a task that is beyond the processing ability of human brains. While the literature is catching up to and guiding investors’ use of machine learning and computational tools to extract qualitative information from disclosure and news, there has been no analysis of the feedback effect: how companies adjust the way they talk while knowing that machines are listening. Our new paper fills this void.
We start with a diagnostic test that connects the expected extent of AI readership for a company’s SEC filings on EDGAR (measured by Machine Downloads) with how machine-friendly its disclosure is (measured by Machine Readability). The first variable, Machine Downloads, is constructed with historical information by tracking IP addresses that conduct downloads in batches. We deem Machine Downloads a proxy for AI readership, both because a request by a machine request is a necessary condition for machine reading, and because the sheer volume of machine downloads makes it unlikely that human readers alone can process them. The second variable builds on the five elements identified by recent literature as affecting the ease with which a machine can parse, script, and synthesize.
We show that, in the cross-section of filings, a one standard deviation change in expected machine downloads is associated with 0.24 standard deviation increase in the Machine Readability of the filing. On the other hand, other (non-machine) downloads do not bear any meaningful correlation with machine readability, validating Machine Downloads as a proxy for machine readership. We further validate that Machine Downloads and Machine Readability are reasonable proxies (for the presence of machine readership and the ease for machines to process) by showing that trades in a company’s shares happen more quickly after a filing becomes public when Machine Downloads is higher, with even stronger interactive effect with Machine Readability. Such a result also demonstrates the real impact of machine-process on information dissemination.
After establishing a positive association between a high AI reader base and more machine-friendly disclosure documents, we further explore how firms manage 'sentiment' and 'tone' perceived by machines. It is well-documented that corporate disclosures attempt to strike the right tone with (human) readers by conveying positive sentiments and favorable tones without being explicitly dishonest or noncompliant. Hence, we expect a similar strategy tailored to machine readers. While researchers and practitioners had long relied on the Harvard Psychosociological Dictionary to construct 'sentiment' as perceived by (mostly human) readers by counting and contrasting 'positive' and 'negative' words, the publication of Loughran and McDonald in the Journal of Finance in 2011, ('LM' hereafter) presents an instrumental event to test our hypothesis pertaining to machine readers. This is because not only Loughran and McDonald (2011) presented a new, specialized finance dictionary of positive/negative words and words that are informative about liability and uncertainty, but also the word lists that came with the paper has served as a leading lexicon for algorithms to sort out sentiments in both the industry and academia.
As a first step, we establish that firms which expect many machine downloads avoid LM-negative words but only after 2011 (the year of publication of the LM dictionary). Such a structural change is absent with respect to words deemed negative by the Harvard Dictionary, which was known to human readers for many years. As a result, the difference, LM – Harvard Sentiment, follows the same path as the LM Sentiment, suggesting that the change in disclosure style is indeed driven by the publication of the LM dictionary.
Loughran and McDonald (2011) developed multiple additional dictionaries of 'tone' words aiming at capturing a richer set of annotations of a financial document, including dictionaries of litigious, uncertain, weak modal, and strong modal words. The authors show that the prevalence of words in each category predicts firm outcomes such as legal liability and reaction from the capital markets. We find that firms with higher expected machine readership became more averse to words from these dictionaries following the Loughran and McDonald (2011) publication. The combined results suggest that managers revise their corporate disclosure in consideration of multi-dimensional effects of their words to the eyes of the machines.
While our analyses thus far focus on the textual information, the application of the underlying theme (ie, 'how to talk when a machine is listening') to the speech setting serves as a test beyond the textual setting. Earlier work found that managers' vocal expressions can convey incremental information valuable to analysts covering the firm. Given that machine learning software makes vocal analytics more and more effective, managers should also recognize the possibility that their speech needs to impress machines as well as humans. Applying a popular pre-trained machine learning software to extract two emotional features well-established in the psychology literature, valence and arousal (corresponding to positivity and excitedness of voices) on managerial speech in conference calls, we find that managers of firms with higher expected machine readership speak in more positive and excited tones, supporting the anecdotal evidence that managers increasingly seek professional voice coaches.
Our paper is the first to show how corporate disclosure in writing and orally has been reshaped by machine readership employed by algorithmic traders and quantitative analysts. Our findings indicate that increasing AI readership motivates firms to prepare filings that are more friendly to machine parsing and processing, highlighting the growing roles of AI in the financial markets and their potential impact on corporate decisions. Firms manage sentiment and tone perception that is tailored to AI readers by avoiding words that are perceived as negative by algorithms. While the literature has shown how investors and researchers apply machine learning and computational tools to extract information from disclosure and news, our study is the first to identify and analyze the feedback effect, which can lead to not only better dissemination of information, but also unexpected outcomes, such as manipulation and collusion.
This post first appeared on The CLS Blue Sky Blog.
Sean Cao is Assistant Professor at J Mack Robinson College of Business, Georgia State University.
Wei Jiang is the Arthur F Burns Professor of Free and Competitive Enterprise at Columbia Business School.
Baozhong Yang is Associate Professor at J Mack Robinson College of Business, Georgia State University.
Alan L Zhang is PhD Candidate in Finance at J Mack Robinson College of Business, Georgia State University.