For a long time, psychologists have studied how the words people use correlate with characteristics like gender, personality type and age. They would answer questions like "do older people use more positive words than younger people?" by making lists of words they deemed positive or negative and then counting them up in language samples given by people from different age groups. Now, researchers have come up with a new way of looking at the relationship between language and social characteristics, in which the differences between groups are suggested by the data itself, and not by the researchers. Instead of asking whether characteristics (young, old) correlate with words (positive, negative), it asks which words best distinguish these groups from each other?
The technique allows you to find differences you may not have even thought of. But its sophisticated statistical algorithms require massive quantities of text. A recent study by H. Andrew Schwartz and colleagues at the Positive Psychology Center of the University of Pennsylvania and the Psychometrics Centre at the University of Cambridge analyzed 15.4 million Facebook messages from 75,000 volunteers who provided information about their gender, age and personality type (in the form of a standard personality test). As might be expected from Facebook messages, some of the researchers' findings below cite a lot of profanity.
Among the insights uncovered were a strong correlation between introversion and Japanese media ("anime," "manga," "pokemon") and a stronger tendency for males to say "my girlfriend/wife" than for females to say "my boyfriend/husband."
With respect to age, the major concerns of each life stage, unsurprisingly, are represented in the words people use at those stages. These word clouds show the words used in four different age groups. Unlike most word clouds, the size of the word doesn't indicate how frequent that word is. Rather, it shows how well that word distinguishes that age group from the others. So four life stages might be summed up like this:
13-18: emoticons, school, homework
19-22: profanity, campus, semester, 21st
23-29: at_work, days_off, office, beer, wedding
30-65: family and friends, daughter, son, kids, repost, copy_and
High school kids are talking about school, college kids are swearing and talking about their 21st birthdays, young adults are talking about work, weddings and beer, and older adults are talking about family and forwarding those "please repost" and "copy and paste" Facebook messages.
Other graphs tell different types of stories. These plots of the frequency of certain types of words show how negativity seems to decrease with age while positivity increases (confirming previous research on the age-related positivity effect) and how people use "I" less and "we" more as they get older (indicating an increasing focus on social relationships).
You can explore the age data yourself at the World Well-Being Project site, where you can generate a plot for words of your choice. Here are a few that I tried:
FOR BABY, CAT, AND DOG
Looks like pets get a bum deal during the baby-making years, but they get the attention back once the kids are older.
FOR WHO, WHAT, WHERE, WHEN, WHY, AND HOW
Most question words cluster together with big dip in the 20's. But "why" and "where" break the pattern. Do 20-somethings think they know everything except why? Do people worry more about the facts and less about the reasons as they get older? Why do they stop asking why? Do teenagers not care about where things are happening until they start driving? Are older people forgetting where they left their car keys?
FOR SUCKS VS. BUMMER
The use of anything with a touch of profanity goes down with age, but slang doesn't necessarily decrease. I thought "bummer" was a young person's word. Guess I'm showing my age.
FOR PICS VS. PIX
Pics at 13, pix at 20, pics at 30, pix at 50. Artifact of people not really settling on one or the other? Or something else? I prefer pics, but it looks like my preferences may soon change.