Google’s latest changes, thanks to the JCPenney/Searchdex debacle, has a lot of search engine optimization people scratching their heads, worrying about what it will do to their search rankings. Google has also declared war on content farms, going after the black hat backlink builders that build crappy sites who try to game search engines by filling websites and blogs with lots and lots of useless, poorly written content.
Don’t ask me how they’re doing it. Google’s remaining mum on the situation, saying only:
Many of the changes we make are so subtle that very few people notice them. But in the last day or so we launched a pretty big algorithmic improvement to our ranking—a change that noticeably impacts 11.8% of our queries—and we wanted to let people know what’s going on. This update is designed to reduce rankings for low-quality sites—sites which are low-value add for users, copy content from other websites or sites that are just not very useful. At the same time, it will provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on [emphasis added — Erik].
It’s this last statement that has me intrigued about how Google is going to recognize some of this. How will they know whether sites have original content, do their own research, or provide thoughtful analysis?
I think the answer lies in the foundation of semantic search.
Semantic search, says Wikipedia, “…seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results.”
In other words, semantic search tries to figure out what you mean, not what you said.
For example, if you’re doing a search for “bark” and “dog,” a regular search engine may give you results not only about dogs, but about the bark of a dogwood tree. But semantic search will know that you’re inquiring about a dog, and return only those results that meet your requirements.
Right now, Google is looking at content farms as a group and dropping them — as a group — from their search index. And that’s fine. For the most part, it shouldn’t hurt anyone who is writing original, thoughtful content.
But what happens when Google decides to take a look at some previously ignored places where people are writing bad content trying to game the system? What happens when they look at WordPress.com and Blogger.com, two favorite targets of the search spammers, who dump crappy article after crappy article into throwaway blogs? Google isn’t going to dump their own blog platform (Blogger) from their index, and they won’t do it to WordPress.com without hundreds of thousands of people crying foul. So how will they do it?
My prediction is that Google will be able to figure out what’s good and what’s bad by using the semantic search technology. They’ll determine what’s well-written and what sucks, what’s original and what was barfed out of an article spinner.
We’ve seen some examples of this technology already. Anyone who has ever run the grammar checker on Microsoft Word (which was apparently written by my 7th grade English teacher) has seen how this works. It checks the grammar and usage in your documents to see if there are any serious errors. It’s not great, and often delivers inaccurate or outdated grammar errors, but it can at least find some problems.
So why can’t Google do this? By using semantics, a good grammar checker, and a thesaurus, Google could determine what is original content and what is crap. By examining the language used, Google may be able to determine the intent of the content writer, and whether they’re truly creating original, thoughtful content, or just trying to game the system again. They could raise up some content while flagging or penalizing others.
The best part is this strategy would encourage people to create valuable content, rather than just trying to stand on the shoulders of others and steal theirs or spin it as a way to game the system. It means your stuff has to be well-written. You need a decent grasp of the English language, and the ability to string more than two sentences together.
(Of course, this could have a detrimental effect on people who just can’t write, don’t speak English as a first language, and teenagers who insist on writing in text speak, but that’s a post for another day.)
What do you think? Will a semantic indexing system help bloggers who are trying to do the right thing, or will it hurt the industry as a whole? Do you think people will mistakenly be caught up in a new semantic system? How would you avoid it, either from Google’s view or the writer’s?
Photo credit: arbyreed (Flickr)
@Andre, Google has a priority to serve the customer.. the search consumer. Without that market (and those pageviews) the company would not have an advertising platform. I believe we may all need to take a step back – look at history, and realize that Google has insight into search – years ahead.
It’s good that Google deindexes duplicate content.
But, I think Google should not try to determine which article is best or worst.
Google is a machine, and it’s human beings that can determine whichconten is best or not.
I think Google just SUCKS!!!
@Andre, man, I hope Google never resorts to that. I think you’d see a lot of people migrating to Bing. Google makes a great argument for the freemium model. Provide stuff for free, and people will pay you in other ways. Their gigantic success has demonstrated that so far, so I hope they don’t change tactics.
soon, the more google ads your buy the higher your page rank will be…let’s face it, google is in business to make money so why not slap any domain that doesn’t pay for ads.
@Mike, what you’re talking about would be an interesting application of the semantic search. I like that idea.
My guess is that someone has an account on eHow.co.uk (or whatever their URL is), and managed to post that article in there. The basic idea is that a backlink, any backlink, is a valuable one, even when the connection doesn’t make sense.
But you’re right that this particular one is a little fishy, since it’s a UK website geared toward Buckeye truck drivers. Hopefully that’s something Google will be able to determine as they tweak their new changes.
@Brian, eww. ;-)
@Julie, I’m proposing that Google should develop an algorithm that would recognize that my article is well-written, but the version I (hypothetically) run through an article spinner is crap.
An article spinner will take a line like “It’s this last statement that has me intrigued about how Google is going to recognize some of this” and spin it as “It’s the final tale that bewilders me about how Google might determine a portion of this.” I would love to see Google determine that the spun version just, well, sucks, and at the very least, ignore it.
I know they do their best to recognize canonical URLs, (which is why the original URL of this post has the date in it), but I’m not suggesting they change that algorithm at all. I would just like to see them recognize good quality content from people who are trying to work within Google’s requirements, instead of those spammers who are trying to trick it.
What Google is attempting to do is a step in the right direction. I was delighted when I noticed the change in my Google Alerts. Instead of seeing one spammy article after another, the alerts seemed to be “real,” honest content with some value and relevance. That is until today when an eHow article appeared in one of my alerts. Better yet, it was an eHow article from their UK site about attending CDL school in Ohio. Why does the eHow UK site feature an article about Ohio trucking driving schools? There isn’t an Oxford of truck driving schools located in Ohio. And I’m pretty sure the driving laws of the UK are not the same as the state of Ohio. Why isn’t Google’s semantic indexing filtering out this result. It’s seems pretty basic to me. And if even you are an Ohio resident looking for CDL training, the article provides very little substance.
I know a lot of people make a lot of money from Demand Media’s sites, but until Google understands the lack of quality information that the Demand sites offer, things haven’t really improved for bloggers and businesses trying to produce relevant content in their area(s) of expertise.
I wonder how many articles the Demand Media sites offer on the term “content farm.”
Erik, I have a question for you. If we feed the article you just wrote into an industry associated blog – say on WordPress.com, should Google index your article on the industry associated blog page. Today, it would. And, BoogleBots would most likely index your article URL first, giving your URL original content priority in search.
Are you proposing a difference in that algorithm? Or is it your point that Google should be able to recognize the good writers from the poor quality writers?
That’s a tough one Erik. I spent two hours yesterday thinking of different scenarios, and ended up right back where I started in the end. Freakin’ JC Penny! I bought a pair of underwear there in 1998, and I am tempted to mail it back to them now :)
Trader Room, when you say article writers, do you mean ezine articles, or bloggers in general?
I think it will lower demand for article writers and push down the industry