Words will not stay in place: tracking vocabulary micro-shifts

2021-08-22

Understanding when and how often terms are used is an important part of mapping how VC chatter changes over time, but it's also important to understand how they are used. The most basic way to do this is to look at co-usage patterns - you can tell what a word does by the company it keeps.

As a concrete example, let's look at language patterns in the investment community in the period between late February and late May in 2021, and how it changed between then and late August.

Getting lost in a map

The initial steps of the analysis are fairly standard: we pick a list of around fifty key terms among the most used during the period (definitely use a proper NLP library instead of just manual string wrangling), build the co-occurrence matrix for them, and then use that to build a distance metric between the terms.

This approach has worked very well for mapping the SP 500 "latent cartography":

Leveraging the BERT neural model, the same general idea underlies the unofficial map of TED World:

For the investor chatter we're tracking in the Investment Climatology Project, though, the results are not quite as enlightening (but in an interesting way):

It's not a random assortment of words. You can see, for example, how CEO and co-founder form a close pair, or the loose cluster of investment-related terms at the top left corner of the graph. But it's still not as informative as we'd like, and it's not clear how it could be leveraged to describe changes in word usage. It's better than a word cloud — a word cloud can't distinguish between Hamlet and Hamlet, But With the Words in Alphabetic Order — but not precise enough for our purposes.

The world may be flat, but language isn't

To build an slightly more detailed view of word usage, we leverage the distance matrix to build a hierarchical cluster of terms, going from large, loosely related set of terms to more and more fine-grained groupings:

Looking at the first few items at the top, you can see how they come together in a hierarchy. Customer and service go closely together (well, at least linguistically), to which you add business and then platform to round up a nice semantic unit.

By the way, perhaps mostly for aesthetic reasons, but also because there's really no particular meaning to being at the top or the bottom of the list above, I prefer the fanned out version of that graph. Going around the circle you'll see interesting pairs like strategy and challenge or social and content, by themselves already indicative of the community's worldview:

So we have built a sort of very primitive map of how terms are used or not together in part of the investment world's chatter between late February and late May of 2021. We can build the equivalent one for the period between late May and late August, but, either as a tree or as a fan, it's not very easy to get an useful idea of language changed by comparing the hierarchical clusters by eye:

A more fruitful approach is to pick terms of interest and see how they changed over time by comparing their neighborhoods: the hierarchical clusters we can build by only focusing on the, say, four terms closest to them during the period.

We begin by looking at the term fund:

As it could be expected, it's such a basic term in this context that it's basic usage patterns didn't really change between those two periods, except for a minimal swap between venture and invest; the order of terms within specific groups can be safely ignored. Other terms, however, show interesting usage shifts, hinting at changes in the concepts and concerns of the underlying community.

Words that did not stay in place

The obvious term to look at is COVID:

That COVID remains closely partnered to pandemic was a given — let's hope that remains the case for a long while — but there was a key change in the neighborhood of the term: where there was strategy, now we find improve. This is an interesting linguistic correlate, and highlight, of how the focus of the investment community has shifted in part from figuring out strategies to handle COVID to the hoped-for (and at this point in time not fully solidified) post-pandemic improvement in the sectors and activities hit by it.

The way this focus shifted is even clearer when we look at the term at the core of the investment worldview: opportunity.

COVID is there! And so is, in fact, community, displacing (one suspects, not fully), financial. It's always dangerous to read too much into this sort of analysis, but it does contribute to framing recent changes in investors moods — at least up to a few days ago — towards seeing the relatively-or-at-some-point-post-COVID world through the lens of a set of opportunities beyond a linear recovery.

That these opportunities are partly community-related is also reinforced by changes in the linguistic neighborhood of public:

If the close pair public/private was linked months ago, and very plausibly, to COVID and information, now it's linked to social and impact, which does parallel a continuously increasing level of discussion — not all of it in praise — of the investment world's social impact.

On the uses of words and their changes

None of the observations made in this post is likely to surprise anybody closely following the field, but it's not meant to; we aren't yet at a point where automated language analysis tools surpass the human domain expert (which isn't to say that this is impossible - just that we aren't there right now). The point is to hint at how they can be used to understand changes in specific, narrow sub-domains that are simply too complex or too numerous to be continuously monitored in an scalable way. The more specialized the domain, the more interesting and potentially useful is to find and model changes in the usage patterns of key terms.

And, of course, as interesting as it is to know how usage patterns have changed, it's even more interesting to get hints about how they are changing. But that's for another post.

None

None

None

None