The Bar-Ilan team says that the same automated techniques they used for detecting the gender of an anonymous author should work just as well for forensics.A computer program that can determine if the author of a book or article is …
Using a handful of linguistic cues, Bar-Ilan University computer scientists Moshe Koppel and Shlomo Argamon have developed a computer program which correctly determines the gender of the author of a previously unseen document about five times out of six.
How does the computer do it? First, techniques from a branch of computer science known as “machine learning” are used to program a computer to analyze examples of male and female writing. The computer is programmed to learn for itself how to distinguish between them based on statistical regularities it finds in the examples. The lessons that the computer learns are then applied to other documents which it has never seen before.
“The kind of things that the computer program looks for is frequency of words, parts of speech,” Koppel told ISRAEL21c. “The computer found that women use more pronouns, particularly singular pronouns, like “I” and “my.” Women use the word “not” far more and other negative words like “wouldn’t,” “couldn’t,” and “shouldn’t.” The prepositions “for” and “with” are used more by women, all others by men. Men use the word “the” and “and” more than women do.”
Koppel is cautious to draw general conclusions for fear of angering various gender theorists, but he ventures that there is a difference between the way men and women write in a variety of areas.
“It seems that women use the kinds of words and phrases that create a relationship with the reader. Men tend more towards just the facts,” he said.
He noted that even when similar articles on similar subjects were taken from the same academic journal, “the computer could tell which article was written by a man and which was written by a woman.”
The Bar-Ilan University study was carried out on about 600 books and articles taken from the British National Corpus, a massive collection of written texts assembled for the purpose of linguistics research. The research will be released this week in Literary and Linguistic Computing, a journal published by Oxford University Press.
The researchers found that the same differences between male and female authors held across the entire range of topics in the corpus, including art, politics, science, biography and many other areas. The results indicate that in both fiction and non-fiction writing, women writers tend to use words indicating relationship between the writer and reader more often than
men, while men tend to use more words describing, specifying, and quantifying things than do women.
One intriguing potential application of these findings is in the area of forensics, helping police solve crimes.
In the famous Unabomber case, the key to catching the fugitive was identification of his writing style. With the current computer program, it will be easy for detectives to make an accurate guess at the gender of a criminal who leaves any kind of written evidence, such as a letter or E-mail. Koppel said programs can also soon be developed that should allow profilers to assess a suspect’s age, and linguistic and educational background as well.
The program is already being used for academic purposes, in determining authorship of documents and literature.
Koppel and Argamon, who is at the Illinois Institute of Technology, conducted their research together with the linguist Jonathan Fine and graduate student Anat Shimony.
Koppel noted that much has been made in recent years of observations in male and female conversational style, by academics such as Deborah Tannen, and in pop psychology books such as Men are From Mars, Women are From Venus. But these focused on differences in social interaction and conversational style. The research in the area has also been challenged, saying that the claims of gender differences is too subjective and anecdotal, and that the researchers have “gone fishing” for data that backs up their theories.
Not so for the gender differences made apparent in the computer program, Koppel says. “This is science.”