Generating WordClouds in Python of Earnings Call Transcripts
I have been exploring ways to analyze text data and a WordCloud often came up in searching for visualization options. In this post, I will look at earnings call transcripts of two industries: cloud computing and large banks. I chose cloud computing because I work with companies that have a SaaS or subscription revenue business model and large banks because I was hoping I could learn more about topics of discussion for bellwethers of the economy.
SaaS Companies (Cloud Computing)
I selected the following cloud companies because I have an interest in SaaS platforms. Platforms generally take the longest to build, but can yield massive economic benefits if built right. For example, there is significant value in becoming the “glue” layer between layers of tech, application and cloud tools because it creates a high retention (“stickiness”) among customers and allows for an effective land-and-expand strategy. Here is my company list:
Atlassian: platform for workflow tools - jira, confluence, bitbucket - that integrate well together
Cloudera: merged with Hortonworks. Both grew out of the Hadoop open source platform
New Relic: platform that enables organizations to monitor the performance of its tech infrastructure
Okta: the Identify Cloud platform provides security for customers by giving its workforces strong identification tools
Zuora: literally wrote the book on the subscription economy and wants to be the recurring revenue data hub for SaaS businesses.
The earnings transcripts which I analyzed were from their most recent earnings calls. The following dates/quarter were Atlassian - Jan 17, 2019/Q2, Cloudera - Dec 5, 2018/Q3, New Relic - Nov 6, 2018/Q2, Okta - Dec 5, 2018/Q3, and Zuora - Nov 29, 2018/Q3.
I used the NLTK Python library to help with the text parsing, but as can be seen in the first image, it is not perfect. The second image shows my effort to filter out what I would call “earnings speak” (fiscal, million, quarter, growth, highlight, expect, strong, etc).
The customer is obviously of great focus because a SaaS company’s success is dependent on customer retention and growth. Other words that jump out are platform, cloud, technology, merger, non gaap and partner. Platform makes sense because this peer group was selected on the basis of having a strong platform. Cloud and technology are terms used frequently in this industry. Merger probably refers to the recent Cloudera/Hortonworks combination, and analyst likely had questions on integration and progress. Non gaap appears because many SaaS unit economic metrics are classified this way (such as ARR), but this could also be used in the context of data that the company feels is more useful to investors to understand comparable performance over a previous period. Lastly, partner probably is a reference to channel selling and the ecosystem (integration partners) that drives additional sales.
Big Banks (Citi, JP Morgan and Bank of America)
As mentioned above, I chose the big banks to learn about the economic outlook. The earnings reports for the three banks were the week of January 14, 2019. Similar to the SaaS companies above, I first generated a chart with little text cleaning and second involved a round of “earnings speak” cleaning.
The first chart isn’t that surprising given growth was a concern leading into earnings and the fourth quarter saw a significant amount of market volatility. The second chart is interesting because capital, consumer, global, card and deposits/loans really jump out. Since the financial crisis 10 years ago, banks are required to hold a certain level of capital (not take on excess leverage) to prevent insolvency in case of another financial downturn. Capital could be used in the context of capital ratio, capital markets, capital returns, etc. Consumer in this context probably relates to the bank’s important business line. Global is mentioned because the three banks are multi-national. Card refers to the important credit card business and merchant processing. Card losses would be an indicator of economic health. More research could be performed here. Finally, deposit and loans are important factors for a bank’s net interest income, which is the gap between what it earns on loans and what it pays on deposits. In general, net interest income has improved year-over-year for the big banks.
Below are links to the Jupyter Notebooks and my GitHub, which has the text data.