Google’s latest changes, thanks to the JCPenney/Searchdex debacle, has a lot of search engine optimization people scratching their heads, worrying about what it will do to their search rankings. Google has also declared war on content farms, going after the black hat backlink builders that build crappy sites who try to game search engines by filling websites and blogs with lots and lots of useless, poorly written content.
Don’t ask me how they’re doing it. Google’s remaining mum on the situation, saying only:
Many of the changes we make are so subtle that very few people notice them. But in the last day or so we launched a pretty big algorithmic improvement to our ranking—a change that noticeably impacts 11.8% of our queries—and we wanted to let people know what’s going on. This update is designed to reduce rankings for low-quality sites—sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on [emphasis added — Erik].
It’s this last statement that has me intrigued about how Google is going to recognize some of this. How will they know whether sites have original content, do their own research, or provide thoughtful analysis?
I think the answer lies in the foundation of semantic search.
Semantic search, says Wikipedia, “…seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results.”
In other words, semantic search tries to figure out what you mean, not what you said.
For example, if you’re doing a search for “bark” and “dog,” a regular search engine may give you results not only about dogs, but about the bark of a dogwood tree. But semantic search will know that you’re inquiring about a dog, and return only those results that meet your requirements.
Right now, Google is looking at content farms as a group and dropping them — as a group — from their search index. And that’s fine. For the most part, it shouldn’t hurt anyone who is writing original, thoughtful content.
But what happens when Google decides to take a look at some previously ignored places where people are writing bad content trying to game the system? What happens when they look at WordPress.com and Blogger.com, two favorite targets of the search spammers, who dump crappy article after crappy article into throwaway blogs? Google isn’t going to dump their own blog platform (Blogger) from their index, and they won’t do it to WordPress.com without hundreds of thousands of people crying foul. So how will they do it?
My prediction is that Google will be able to figure out what’s good and what’s bad by using the semantic search technology. They’ll determine what’s well-written and what sucks, what’s original and what was barfed out of an article spinner.
We’ve seen some examples of this technology already. Anyone who has ever run the grammar checker on Microsoft Word (which was apparently written by my 7th grade English teacher) has seen how this works. It checks the grammar and usage in your documents to see if there are any serious errors. It’s not great, and often delivers inaccurate or outdated grammar errors, but it can at least find some problems.
So why can’t Google do this? By using semantics, a good grammar checker, and a thesaurus, Google could determine what is original content and what is crap. By examining the language used, Google may be able to determine the intent of the content writer, and whether they’re truly creating original, thoughtful content, or just trying to game the system again. They could raise up some content while flagging or penalizing others.
The best part is this strategy would encourage people to create valuable content, rather than just trying to stand on the shoulders of others and steal theirs or spin it as a way to game the system. It means your stuff has to be well-written. You need a decent grasp of the English language, and the ability to string more than two sentences together.
(Of course, this could have a detrimental effect on people who just can’t write, don’t speak English as a first language, and teenagers who insist on writing in text speak, but that’s a post for another day.)
What do you think? Will a semantic indexing system help bloggers who are trying to do the right thing, or will it hurt the industry as a whole? Do you think people will mistakenly be caught up in a new semantic system? How would you avoid it, either from Google’s view or the writer’s?
Photo credit: arbyreed (Flickr)