How Old Invalid Content Can Outrank New, Accurate Content & How to Beat Establish SERPS
The question of acquired trust is an assumed ranking factor by many search engine optimization professionals.
We also know using HTTPS is a ranking signal, but how far can that signal reach? And can it be a quick sign of content validity?
Professionals guess because Google uses a necessary veil of secrecy about the search algorithm details, so we make educated conclusions about what matters for good search ranking.
My background is web development and over the past year I have been able to attract new SEO clients in the development space. There is a common scenario I have encountered trying to rank client and my own content and theirs for developer topics.
A few days after the hangout Roger Montti wrote about the discussion on Search Engine Journal. He wanted to dive into why older content outranks newer content.
That was not exactly where I was going with my question, but close.
Roger's article was then picked up on the recent SEO 101 podcast, which I heard last night while driving. Yeah I was pretty stoked hearing the host discuss the question and discussion from the hangout!
The discussion starts around minute 24 in the podcast episode. Ross Dunn leads the discussion, which is more based on the SEJ article than the hangout discussion.
Ross interprets the article thread a little incorrectly and attributes me building the scenario to John Mueller, which leads to some confusion. So maybe I can clarify things here today.
It also gave me more insight into how the professional search community perceives content trust and age.
Let me try to add more color to the scenarios I have seen and what triggered my question.
Software and development is unique, and all niches have their own nuances, but the platforms, APIs, techniques, etc. update, often. This means content written 10 years ago may not be valid anymore.
For example, my history is in Microsoft ASP.NET and related Microsoft developer technologies. The .Net platform is almost 20 years old at this point and has grown and matured many times in the past two decades.
The library, C# syntax and best practices have completely changed in that time. Articles written back in 2003 cover techniques and APIs that today are bad or don't work because the API has been replaced and deprecated.
A good example for the web is any content referencing how to code for Internet Explorer, a browser replaced by Edge several years ago and reaches official end of life in less than a year.
In fact, old developer content rarely matters and often waste engineering time or worse obsolete techniques are applied to new software. This creates bugs, bad user experiences and many other 'code smells'.
While trying to rank content for developer centric content I have encountered this scenario several times.
There may be a article I or a client has written or updated that is more thorough and accurate than a page listed in the top 10 for target keywords.
The problem is, even after a few months, we can't seem to out rank the old, outdated content.
What can we do to try an unseat the old content for a higher search result?
What are stale content signals and how does Google evaluate content accuracy?
Is HTTP the only signal of obsolete content? Does it mean the content is bad?
Is old, established content sort of perma ranked due to built up search equity or 'Google Trust'?
I thought this would be a fairly cut and dry article, but the SEO 101 episode posed a scenario that drove me through a great case study in new content out ranking old, established content. Including an established, authority site using HTTP.
So let's learn together.
What Does Trust Mean to Google?
Trust is a quirky term search professionals use. A popular acronym floating around our space today is E.A.T: Expertise, Authority and Trust.
In all honesty each of those terms are more like synonyms for the same thing. To me it means is the content good and if I read it will I have a solid answer to my question.
My understanding is Google tries to classify sites by content niches and gives them a mythical authority for a content niche. This equates to a form of trust and affects how a site can rank across different search terms.
A good example is this site. I am in the process of steering the site's content away from being a developer centric resource to more of an online marketing and SEO authority.
It is not easy. It will take time and a new set of backlinks from sites with trust on the SEO niche.
Right now I don't have a lot of acquired trust for SEO, but over time that will change.
We know that older sites and pages tend to rank better than newer content. Keyword there is 'tend'.
We also know or assume Google likes fresh content, but what exactly is fresh content?
What Exactly is Fresh Content
On the SEO 101 episode one of the host used the scenario of 'how to boil eggs', which is something that has not changed in years. So a topic where a page written 10 years ago can easily outrank a new article based on accumulated search ranking factors, or the mythical Google Trust.
I decided to use this as case study. You would expect stable search results and old, crusty content.
You would be wrong.
In fact the current #1 article, according to the AHrefs update, for How to Boil Eggs was written in July 2018, about 6 months ago.
And it out ranks an older article on All Recipes, which still uses HTTP for the term 'boil an egg'. A great example to study!
And, according to AHrefs has no backlinks to the newer article (it does now that I just linked to it). And they conservatively estimate this page gets over 50,000 monthly search visitors!
I would kill for a page like that any site I manage, and I am certain you would too.
So how is this page out ranking established pages?
I thought I had no idea! Wait for it and I will tell you how.
The site has a decent AHrefs domain authority number and the page has over a dozen internal links from related content. The on-page SEO factors look pretty solid, nothing overly impressive.
It’s just a standard WordPress blog site, with nothing too fancy standing out.
The content seems to be well thought out with SEO in mind. The title and main headline are targeting the primary keyword, "how to boil eggs", which according to AHrefs has over 176,000 global searches per month. And the page ranks for nearly 3000 keywords, so lots of traffic opportunities.
In fact, when I ran a WebPageTest analysis on the site it scored well on the primary report card categories. But overall it has some issues loading over 1000 resources from all sorts of 3rd party sites and it does not use a CDN.
The page speed time is bad, very bad. I would not let this page go live if it were mine.
And it ranks well for many key terms, without links.
One thing is for sure it is not age and acquired search trust signals, at least at the page level.
So dang, there goes the theories.
Back to square one.
This page is the top result, according the AHrefs update, simply because its content is good and they did not screw up basic on page stuff, or is it?
Thanks to the AHrefs Content Explorer I think I figured it out, Facebook!
The page has about 6700 Facebook shares!
While I have heard John Mueller state Google does not really use social signals as a ranking factor because they can't verify the sources, it appears it worked here.
So the key to ranking #1 now appears to have a strong social engagement profile. Problem solved, we can all now redirect to Facebook and retire.
OK, a little sarcasm there.
Here is what I think is going on.
The Stay at Home Chef has over 2.7 million Facebook followers, which gives it a large audience that generates a lot of natural 'traffic'.
I am also going to speculate the article post was promoted to increase engagement. I mean that is how Facebook works today.
As you can see the post I found only has a few hundred engagements, not the thousands reported by AHrefs. So there has to be other posts floating around as well as promoted posts statistics.
With 2.7 million on Facebook, I guarantee the site has a large e-mail list, again more non-search driven traffic.
Many in the SEO world speculate direct traffic affects your ability to rank, and it certainly seems like the case here.
SEO is hard and requires getting many things right. Google obviously noticed the attention this article received and gave it a priority result. That gradually triggered other signals like a good click through rate, low bounce rate, etc. used by Google to track search result success.
In short, the page gets lots of things right, just not everything. But it got some key factors right to such a degree it is able to outrank old, established content rather quickly.
It helps to have a multi-million person audience to generate instant traffic for a new post.
The All Recipes article is about 3 years old (October 2015) and has 31 referring domains. And if you use the content explorer you will see it has over 7000 Facebook shares, more than the Stay at Home Chef.
So, it should out rank the newer content, right?
The social shares are probably dated, which means their affect, if any has dissipated. I also suspect the direct traffic its audience generated has also faded. I doubt All Recipes has promoted the article in years.
This means the article retains rankings based solely from on page SEO and search activity, like click through rates from the search results.
I suspect it is not doing well these days and I think part of that is due to the lack of HTTPS.
A small number of searchers in a niche like recipes will pause when they see HTTP. But the All Recipes brand has a built-in trust factor, etc.
Of course, now that browsers like Chrome are displaying the 'not secure' message for HTTP served sites I suspect a percentage pogo stick back to the search results, giving the All Recipes article some bad signals to Google.
This is what makes SEO so much fun!
Dissecting John Mueller's Comments
This is where we can have some fun and try to read between the lines of John's comments.
“It feels more like we just have so many signals associated with these pages. And it’s not that, like if they were to change, they would disappear from rankings.
It’s more well, they’ve been around, they’re not doing things clearly wrong for as long a time. And maybe people are still referring to them, still linking to them. And maybe they’re kind of misled in linking to them because they don’t realize that actually the web has moved on.”
It is the second paragraph that draws my attention. I interpret this statement as saying newer signals can outweigh older signals, like backlinks.
I think my boiled egg research proves this point. The current #1 is very new and has a lot of recent activity-based signals. Other content may have equal or better 'established' signals over a long period of time, but the weight of the older signals may have diminished.
So in a way the backlink you got 10 years ago may have a half-life. Its effectiveness decays over the years.
Is this something that would always be true?
I doubt it.
If a referencing page is 'active' and has traffic and other authority and trust signals that link may not decay at the same rate as a site that have very little activity by both the author and visitors.
In the boiled egg case study, I suspect even though All Recipes is a trusted brand with lots of activity, it just does not look as attractive to current searchers as the new article by the Stay At Home Chef.
Let's just say, to Google real user activity appears to have waned for All Recipes and the Stay at Home Chef is waxing. So it tries to ride the bandwagon with the today's more popular article.
Also note that John says may in his quote.
Since I am a developer, I am going to use the 'it depends' statement here because it can and will vary.
Is HTTP a Sign of Stale Content?
More often than not pages and sites still using HTTP are old, often 10 or more years old. The layouts are obviously dated, sometimes still using table layouts. In many cases the layouts are not responsive and do not render well for mobile devices.
Basically, the site looks 'run down' by modern standards.
It is the Broken Windows Theory applied to online marketing.
Several dozen times I have visited the site to see other content and get a general vibe for the site. Many times, the latest article on the site has a date at least 3-5 years in the past.
Another sign the site is out of date, and worse, abandoned. The owner just keeps paying for the site's hosting and domain registration, so the content is still there.
This seems to be more common in the technology space. I think this is due to the geek in charge liking the fact they have a web site and it is cheap to keep it live.
“HTTPS is a ranking factor for us. But it’s really kind of a soft ranking factor. A really small ranking factor.”
We know HTTPS is used as a ranking signal for search engines. In his answer to my question Mueller said it is a 'soft' signal. By saying soft he means it does not carry as much weight as other factors, and there are hundreds we think we know.
As I evaluate search results, subjectively I see less than 5% of top 10 results use HTTP today. This is slightly higher for developer content than other niches I research, which again signals deviation from the trend.
In my boiled egg example you can see how other signal scores and weight can outweigh the HTTP/HTTPS signal. My theory is using HTTP drives other negative signals like bounce rates, time on page and pogo sticking.
While John says HTTPS is a small signal, I think it probably triggers other behaviors that can hurt a page and site's ability to rank for targeted keywords.
I think it is more of a signal used consciously or unconsciously by real people than search engines. We see those insecure messages and anxiety increases and we leave the site.
It is not as much about the search engine trusting the site, it is about actual humans trusting the site.
From a content perspective there seems to be a direct correlation to sites that have not upgraded to HTTPS and whether they are still active. For technical content this is a problem because the content can be inaccurate and lead to bad software and problem solving.
Can Google Determine Content Validity
If you watch the exchange at the hangout, John touches on if Google can determine if a page's content is accurate or not.
In short, they can't. It is not their responsibility per se, but general consumers use top results as a trust signal. Sort of like Google saying this page is the authority.
There is an implied authority humans give Google because of their brand authority. We trust that Google has figured out the right answer and assume what it provides to be correct.
Most of the time it is. But not always.
This is why Google relies on backlinks and other off page signals, like the direct traffic I discussed earlier, to determine a page's reliability.
In my scenarios the search bot and ranking algorithm does not know if a library's class or method has changed its signature or functionality since 2003. Therefor when I publish an article showing how to use the latest interface and functionality it can't determine my article is more authoritative than an article with accumulated search value over the 15 years.
Over time my new content, assuming it stays accurate, should out rank the stale, obsolete content. But Google or Bing can't determine the accuracy difference between my article and the old, obsolete article.
This is another reason why you need to keep updating your old content.
Google like fresh content, at least that is the theory.
Does Google Like Fresh Content
The general consensus is Google or more accurately consumers like fresh content.
But what exactly is fresh content?
John Carcutt said to him fresh content is brand new, greenfield articles. But to many it is old content that has been significantly refreshed as well as brand new content.
I look at fresh content as being either. I have learned to update my articles from time to time. Sometimes based on data I see in the search console, other times it is because the technique is not valid today, etc.
So in essence I add new content even to existing articles. My attitude is by changing the content I am making the page stronger and should reap the SEO rewards for doing so. I don't worry about losing existing rankings.
I think the best answer is, yes Google likes fresh content, but not as a direct ranking factor. Fresh content is appreciated by consumers and that drives the signals search engines use to rank pages.
If all goes well the natural human response to articles triggers the signals that Google, Bing and other search engines use to curate results for queries.
Create content for humans before bots. If you do all the human factors right, you will trigger the synthetic factors for better results.
SEO is so fascinating and complex. This is why it is fun!
So using insecure HTTP can be a ranking factor. It is probably a weak direct signal but can indicate weakness in other direct factors used by search engines to rank content for search results.
I noticed obsolete content ranking well that was produced years ago and has sort of built up ranking signals over the years. This is problematic in the development space because platforms, syntax and technique progress quickly.
This can sometimes make it tricky for us to create fresh content for technical search phrases. But this does not mean you can't unseat establish content, even when it is still correct.
The boiled egg scenario proved to be a great case study because the current top content is very new and out ranks older, established content.
It also just happened to outrank an authoritative article using HTTP, a perfect example for my hangout question.
We learned the natural, recent human activity signals have probably triggered many positive signals to Google making the Chef at Home article to rank quickly against established content.
This also opened a whole separate thread of thought, the power of direct traffic and more recent activity against older signals.
Can ranking signals decay over time? It appears they can.
This means you need to always be on top of your game and keep your content fresh. Not just for newer content, but to continue to trigger natural interest so Google knows the page still appeals to consumers.
If you lose the natural human interest eventually search engines will too.