PhotoIt's called “opinion spam.” Someone wanting to convince consumers that their product is good – or not a scam when it really is – will put up a website, or submit content to an existing site, that contains favorable reviews.

How do you tell opinion spam from sincere reviews? Researchers at Cornell University say they are developing computer software that’s pretty good at it.

In 800 Chicago hotel reviews, their software was able to pick out 90 percent of deceptive reviews. In the process, the researchers uncovered some key features to help determine if a review was spam, and even evidence of a correspondence between the linguistic structure of deceptive reviews and fiction writing.

Help spot the fraudsters

“While this is the first study of its kind, and there's a lot more to be done, I think our approach will eventually help review sites identify and eliminate these fraudulent reviews,” said Myle Ott, Cornell doctoral candidate in computer science.

The researchers asked 400 people to deliberately write false positive reviews of 20 Chicago hotels. These were compared with an equal number of randomly chosen truthful reviews.

As a baseline, the researchers submitted a subset of reviews to three human judges – volunteer Cornell undergraduates – who scored no better than chance in identifying deception. The three did not even agree on which reviews were deceptive, reinforcing the conclusion that they did no better than chance.

Humans suffer from 'truth bias'

This may not be surprising, since humans don't appear very skilled at  figuring out when someone is blowing smoke. Historically, Ott notes, humans suffer from a “truth bias,” assuming that what they are reading is true until they find evidence to the contrary.

When people are trained at detecting deception they become overly skeptical and report deception too often, generally still scoring at chance levels.

The researchers then applied statistical machine learning algorithms to uncover the subtle cues to deception. Deceptive hotel reviews, for example, are more likely to contain language that sets the scene, like “vacation,” “business” or “my husband.”

Truth-tellers use more concrete words relating to the hotel, like “bathroom,” “check-in” and “price.” Truth-tellers and deceivers also differ in their use of certain keywords, punctuation, and even how much they talk about themselves. In agreement with previous studies of imaginative vs. informative writing, deceivers also use more verbs and truth-tellers use more nouns.

Not foolproof

Ott cautions that the work so far is only validated for hotel reviews, and for that matter, only reviews of hotels in Chicago. The next step, he said, is to see if the techniques can be extended to other categories, starting perhaps with restaurants and eventually moving to consumer products. He also wants to look at negative reviews.

“Ultimately, cutting down on deception helps everyone,” Ott said. “Customers need to be able to trust the reviews they read, and sellers need feedback on how best to improve their services.”