GRAV Duplicate Data?

Hits: 714
| How-To, Trouble Shooting | GRAV CMS, SEO

We want to share a duplicate data SEO issue we found with one of our GRAV client sites and the steps we used to correct it. Let's start by first explaining the SEO issue and the steps we took to identify it.

What is duplicate data

Our friends at SEMRUSH have done a great job explaining why this metric is important and why it could be harming your site.

This issue falls into two categories: Duplicates & Indexability.

ABOUT THIS ISSUE
Webpages are considered duplicates if their content is 85% identical.
Having duplicate content may significantly affect your SEO performance.
First of all, Google will typically show only one duplicate page, filtering other instances out of its index and search results, and this page may not be the one you want to rank.
In some cases, search engines may consider duplicate pages as an attempt to manipulate search engine rankings and, as a result, your website may be downgraded or even banned from search results. Moreover, duplicate pages may dilute your link profile.

Citied of SEMRUSH

Now the suggested fix:

HOW TO FIX IT
Here are a few ways to fix duplicate content issues:

  • Add a rel="canonical" link to one of your duplicate pages to inform search engines which page to show in search results
  • Use a 301 redirect from a duplicate page to the original one
  • Use a rel="next" and a rel="prev" link attribute to fix pagination duplicates
  • Instruct GoogleBot to handle URL parameters differently using Google Search Console
  • Provide some unique content on the webpage

    Citied of SEMRUSH

After further inspection

After reviewing the test results, we found that the GRAV system recognizes both the www and the non-www versions of each page. This warning is triggered because the "canonical" meta tag would pass both versions of the URL depending on the visitor's point of entry to the site.

It would be found in the <head> </head> of any of your pages and would look similar to this:

<link rel="canonical" href="https://www.example.com">
# or
<link rel="canonical" href="https://example.com">

Example of how Grav Creates the Canonical Meta Tag

Here is how GRAV creates the canonical URL in a theme like Antimatter (note: your GRAV theme could vary)

For this example, we can find the template file at /user/templates/partials/base.html.twig

 <link rel="canonical" href="{{ page.url(true, true) }}" />

This method works well; however, if you enter the site via www. your {{ page.url }} will pass along this information. You could always enter this information manually in each .md file or via the admin plugin, but this could be very time-consuming.

Setting this globally is our preferred way of dealing with this challenge, and we only need to tweak two settings.

Find these parameters (absolute_urls & custom_base_url) in your /user/config/system.YAML file and change them to match your data. If absolute_url: is false, then change to true and replace example.com with your URL.

absolute_urls: true
custom_base_url: 'https://example.com'

If you are running the admin plugin, you can find these settings under "configuration -> system -> advanced"

After clearing cache and reloading your page, your canonical URL will remain the same.

Taking it Another Step

We have now resolved the core issue, but we like to take one more step. We redirect all URL's so they always match using a 301 redirect. Why do we do this? Any subsystem or sub-directory service, application, or website that falls outside of the GRAV environment will not adhere to the GRAV system.yaml; therefore, your www. can come back into play. The easy solution is to set it and forget it.

Configure DNS Correctly

We have a CNAME DNS record pointing the www. to the A RECORD of example.com. You can use an A RECORD if you'd like.

Once this is confirmed, we find these two options work in most circumstances:

Option 1: .htaccess
This solution is available for the majority of hosting providers and tends to be the most straightforward solution.

  1. Login via SSH or FTP to the root directory of your website and locate a file called .htaccess. If this file does not exist, you will need to create it.
  2. Once located or created, you will want to paste this code near the top of the document.
    Be aware you will need to replace the information with your own
RewriteEngine On
RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]

What if you would like to redirect from non-www. to www. - see below:

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

It is important to note that this method works with Apache web servers with the mod_rewrite module enabled. If you don't know the answer to this question, you contact your hosting provider or server administrator.

Option 2: Apache Redirects
This solution is a bit more involved but can also do the trick.
Take note that not all hosting providers will grant you access to this configuration. If not, typically, they will provide a Control Panel section that will allow you to create a redirect. Essentially this section will accomplish the below-mentioned configuration.

If you have access to Apache configuration files, below is an example of how to set up the redirect:

<VirtualHost *:80>
    ServerName www.example.com
    Redirect permanent / http://example.com/
</VirtualHost>

<VirtualHost *:80>
    # Server Redirection to NON-WWW
    ServerName www.example.com
</VirtualHost>

So now you are ready to combat the dreaded duplicate data error and leave your comments below.


Leave a Comment