David Brooks waxes poetic about word frequencies and the good old days in today's NYT: What Our Words Tell Us. Brooks cherry picks three recent Google Ngram analyses (by non linguists) and provides paper thin summaries of their findings,...
David Brooks waxes poetic about word frequencies and the good old days in today's NYT: What Our Words Tell Us. Brooks cherry picks three recent Google Ngram analyses (by non linguists) and provides paper thin summaries of their findings, then concludes that America has lost is moral core. These analyses all depend crucially on the creation of word categories like “individualistic words” and “moral terms”. These are not quite synonyms, but they require that the words in each class bear some semantic link between them. This begs the question: Are these groupings natural? Is there something psychologically real about them? Linguists care about word classes quite a bit (computational linguists even more so). There are ways of constructing naturalistic sets of words. However, Brooks says nothing about how these studies performed their categorizations, so I thought I post a quick review as it's important in judging the validity of the results. Twenge et alThe first study by Twenge et al (which he doesn’t link to, but I do below) followed a scientifically reasonable path to create their word sets. They asked 53 Mechanical Turk participants to “generated words characteristic of individualism and communalism.” Then, they had a different set of 55 Mechanical Turk participants rate those words on a 7-point Likert scale. The top 20 words were then used as their search set. FYI, here are their two sets: Individualisticindependent, individual, individually, unique, uniqueness, self, independence, oneself, soloist, identity, personalized, solo, solitary, personalize, loner, standout, single, personal, sole, and singularityCommunalcommunal, community, commune, unity, communitarian, united, teamwork, team, collective, village, tribe, collectivization, group, collectivism, everyone, family, share, socialism, tribal, and unionKesibir and KesebirKesibir and Kesebir did 2 studies. In study one, they took ten words they found as synonyms of “virtue” in an unnamed thesaurus and searched Google’s Ngram for those words. Here are the ten: character, conscience, decency, dignity, ethics, morality, rectitude, righteousness, uprightness, and virtue. In their second study, they constructed a set of 80 virtue words taken from websites about virtue in literature (e.g., honesty, patience, honor) then asked participants to rate each one as No = -1, Perhaps = 0, and Yes = 1. Then they took the 50 words with the highest averaged rating and search Ngrams for frequency. Klein Klein unapologetically gives no motivation for his word sets whatsoever. A “very casual paper” indeed. The ProblemWhile I respect the attempt of the first two sets of authors to add some psychological reality to their linguistic categories, they fall for the same naïve assumption that plagued linguistics for hundred of years: that people's conscious judgement of meta-linguistics is valid. Syntacticians discovered the folly of grammaticality judgments. I have been involved recently in a number of Mechanical Turk ratings tasks and we're finding that it very difficult to get consistent ratings. I believe the same issue is at play here. Plus, ratings can easily be affected by context like surrounding text, yet none is given in these tasks. It's not clear what it means to rate isolated words. Word semantics by their very nature are contextual. Words are not thought. These studies seem to be a variation on the “No word for X” syndrome (see here for a recent rant). Certain types of words may be used more or less frequently over some time-scale (like one century), but that doesn’t necessarily mean that we are thinking differently over that time-scale. Unlike Brooks, I’ll link to the actual papers (all free, but the second two require email registration): Increases in Individualistic Words and Phrases in American Books, 1960–2008. Jean M. Twenge, W. Keith Campbell and Brittany Gentile The Cultural Salience of Moral Character and Virtue Declined in Twentieth Century America. Kesebir and Kesebir Ngra