As we are hurdling through the cosmos on this little tiny rock, spending time thinking about where you have been and subsequently where you are going is a valuable and powerful tool to have in your bag.
I wanted to spend a little bit of time sharing what I have learned about Search Engine Optimization (SEO) in 2023 to hopefully help anyone looking for a bit of advice about SEO.
The first thing that I learned is a bit lengthy and technology focused and if that is not your bag, feel free to skip ahead to the more focused learnings below!
Focus on Writing Over Using Fancy Tools
When I first started this blog, it was a Wordpress instance running in my homelab powered by Kubernetes living on top of VMWare connected to a FreeNAS server via iSCSI. I was using Longhorn for provisioning MySQL database storage and Wordpress uploads. Everything was provisioned via Kubespray. LinkerD was used to support mTLS and certificates were issued via Let's Encrypt. Prometheus was used to supply dashboards for Grafana, and I integrated everything into a personal Slack channel to get notifications about the current state of my system. I had the site proxied behind Cloudflare Tunnels to make sure that I did not need to open any ports on my home firewall to the internet. I used Cloudflare CDN to front the site to reduce overall load and I used custom firewall rules to block access to /wp-login.php along with all admin pages.
It was a thing of beauty, but it was also a distraction. I was spending more time maintaining my systems than I was actually blogging or learning about how internet SEO has changed in the 10 years that I was away from publicly hosted websites.
The original problem I noticed was when my house lost power, which happens someone frequently around here. I should really move into the 21st century.
My equipment was down overnight which meant that Cloudflare ended up flushing the cache on my entire website and I woke up to a site outage. Now the site was not that old at this point so I decided that I should tweak all of my settings to ensure that things turn themselves back on automagically when power is restored. This sounds good in theory, but in practice this was a nightmare. I spent weeks working on scripts to automagically recover Longhorn partitions, recover MySQL databases, automatically bounce services if things do not come up properly, resyncing EtcD, blah, blah, blah.
Additionally, and this is more of an aside than anything, I am a big fan of clean and pure HTML output that is fast and responsive. In reality, I am a Google PageSpeed junky. Wordpress just wasn't cutting it when it came to the speed I needed. It felt bad being a talented technology purple unicorn having a website that was in the red when measure by PageSpeed.
I decided it was time to really take a step back and wholistically look at my situation. I ended up realizing that my homelab being down was not actually the root problem I needed to solve. What I really needed to solve was the complexity of running a website. I needed to realign my priorities around what is important and what is not. I needed to realize that my existing requirements suck.
Long story short, it was time to simplify.
I started from scratch looking to build a content publishing cannon that was reliable and without unnecessary bells and whistles. Below is what I came up with:
- 99% uptime
- Promote the ability to create and publish multiple articles per day
- Push all new content through CI/CD for traceability
- Eliminate technology distractions to focus on requirement 1 & 2
From these newly refined requirements, I took on the task of reconstructing the website onto a new platform!
Step 0: Choose a Workflow
Being a technology enthusiast, I really wanted to keep in a workflow that emulates rapid technology development. I do not have a team working on this site, it is just me and my time needs to be spent effectively. I wanted the opportunity to introduce emerging technology into my workflows if it does not violate any of my requirements.
I settled on a GitOps style workflow. When I commit content, I want it pushed to some place that it can be previewed before opening a PR and pushing it to production. Because I was already utilizing Cloudflare for CDN and ZeroTrust, I decided I would utilize GitHub for version control and Cloudflare Pages to support this workflow.
Step 1: Export
I quickly threw together some scripting that scraped my entire wordpress instance. My goal was to export the content in both Markdown and HTML along with a page manifest including links and assets. The HTML package was pushed into the code repository and then hosted on Cloudflare pages so that I could completely remove my entire Wordpress setup.
The Markdown exports and page manifest will be used in Step 2.
Step 2: Wrap the content in a Static Site Generator
I have had some experience with Hugo in the past. I started working through throwing a new theme together to support the import of content from my existing website. After some Markdown cleanup, I had a working website again. After committing and pushing the changes, everything looked really good in the Cloudflare Pages preview environment. I took the preview environment and ran it against Google PageSpeed and saw that I was still in the red... WTF.
After an unreasonable amount of time and energy, I was able to track down a lot of the PageSpeed issues to CSS layout shifts and image sizes. I wrote some logic for Hugo that would utilize its built in image processor to automgaically scale my images to the numerous different sizes I needed along with automagically linking the resources for me. Additionally, I spent way too much time optimizing ad loading on the site which is a story for a different blog post.
Step 3: Re-Publish
Because of the work I did with the site export, this was simply a branch merge from my Hugo branch onto the main branch.
Step 4: Iterate
One benefit of the architecture of this new site is that I am able to make broad sweeping changes to the site very rapidly.
One of the first iterations I took on was to write an auto-linker for the site which would utilize keywords and ensure that a single page with keywords matching another page has at least one link. I don't need to remember to create internal site links anymore, this happens automagically for me.
Another iteration that I took on was to create both a Tweet generator and a Cloudflare Worker bot that I call Social Worker. This allows me to scan all of the pages in Markdown looking at the frontmatter for any page that does not have generated tweets. From there, I can use OpenAI to generate tweets and add them to the frontmatter for consumption by Social Worker which runs on a schedule posting tweets on my behalf.
Publishing at Warp Speed
Publishing is now a breeze. I can churn out multiple articles per day focusing on content first. I don't need to worry about my WYSIWYG editor moving elements around a page, or version upgrades, or power outages, or anything else. I can simply write content in Markdown, find a hero image, run my linker and tweet generator, and then push the content.
While I love where the overall conversion to Hugo ended up, I do have some regrets about the tooling I selected and overall site composition.
I am using Gulp for file watching which makes real time recompilation of my site happen very quickly. Tailwind has also been very nice for keeping my site styling light weight and relatively consistent.
I also used this site as an opportunity to dive into a bit or React. The top nav and table of contents are written in React. While React itself is interesting to write, React compilers/bundlers are Satan's play thing.
I will more than likely be converting everything into Golang over 2024.
Root Articles VS Organized Content
On my Wordpress implementation, all articles were organized via /year/month/day/title. In this new structure, I did not want to be bound by by that convention so I started to shove all of my content at the root of the website. This ended up making the URL structure very wide instead of deep.
What I have learned is that you need to have depth in your content to provide additional context about your content to Google search classifiers. If you write an article about cats (/cats), and you end up writing an additional piece of content about litter boxes (/litter-boxes), then you really should put litter boxes under the cats content (/cats/litter-boxes).
Where depth hurts is when you also increase crawl depth. If Google needs to follow a very long chain of links to get to your deepest content, you will exceed your crawl budget and Google will just give up. This is where internal link building takes over. Throughout this site I have added contextually relevant internal links (based off arrays of keywords) that link very deep content to more shallow content (and vice versa). This makes it very easy for Google to find all of my content regardless of its depth in the content tree.
I know this section was a bit long and rambley, but the moral of the story is this: Focus Drives Results
If you want more context, watch this:
While I don't know anything about music, his quest to build a marble machine that plays music and his journey to become a better engineer is inspiring!
This is just one video in an entire series where you can watch someone evolve their thinking and processes.
SEO is a Game of Patience
At the time of writing, this blog has roughly 250 pages in total. That breaks down to 65 blog posts with 148 topic pages.
Early on, I would sit on Google Search Console (GSC) hitting refresh every few hours to look at the Average Position metric to see if my articles are moving in the right direction with minor tweaks here and there. I was always disappointed by the results which is VERY deflating.
After I went through site re-platforming that I have detailed up above, I made a conscious decision to focus on content first. In order for me to compete in the market of DevOps information, I needed to publish a lot of articles.
As I was publishing, I would keep notes about information to update old articles. This helped me keep a flow of creating new content while updating older content to be more complete.
What I noticed was something interesting, all of my pages started to move up the ranks together. I didn't need to focus on a single subject and I could publish a variety of tangentially related content to help fight off the monotonous boredom that comes from churning out new content.
So, what did I learn? Write good content and be complete with your content and you will rank. Watching GSC for changes is a waste of time because a watched pot never boils. I now check GSC about once per week just to make sure nothing looks funky in there, but I completely ignore the Average Position metric all together. I know that writing good quality content and covering topics more completely is the path to success with SEO.
▶ Key Insight
I forgot to mention, the Google Sandbox doesn't exist.
In the various corners of the internet where I see SEO "experts" hanging out, a question that is posed more frequently than it should be is "Am I stuck in the Google Sandbox?".
I am always amused by the answers that everyone brings to the conversation, but the reality is simply this:
If you feel like you are stuck in the "Sandbox" then your site is actually being viewed as not being competitive or authoritative enough to rank well. If you think that you can rank day 1 simply because you have written a subjectively good article, then SEO would not exist. The SERP ranks would be changing minute by minute with new hot shots climbing the charts. That would be a poor search experience and Google actively promotes a good search experience that people want to use, all in the name of selling ads of course.
The reality is your site has simply not built up enough indicators of authority or credibility to compete with other sites doing the same thing in the same space. This may feel like a sandbox, but in reality, it is just time working against you.
P.S. I have no evidence to support these claims other than referential experience.
Everyone Everywhere Needs Some Kind of QA
I don't care if you are the best content writer that the earth has ever seen. I don't care if you are the most gifted pastry chef, engineer, door dash delivery driver, or any other kind of person; you need someone or something to check the quality of your work.
When this site was very small, I would would manually QA and check over things to see if I made any errors. From there, I would push the site to a preview environment and then double check my work before pushing to production. As a team of 1, I only have myself to look after my own work. Needless to say, but would you be surprised to learn that I make mistakes? I know, shocker right? This article probably has a few mistakes already
Knowing my own tendencies to work quickly and my general over confidence, I knew I needed some automated QA help.
I have developed some scripting that helps QA my site for me. First is a dead link checker. This script goes through the entire website (either locally or live) and pulls down all of the links for scripts, A tags, json, images, css, etc validating that all URLs respond with a 200 status code. Since I don't have any advanced functionality like forms, I do not need to worry about checking anything other than GETs.
When I first developed the script, I was shocked at the number of dead links I had on a site that was only a year and a half old. Now I religiously check the site with a variety of QA scripts to get a full assessment of my site routinely. This has allowed me more focus time on writing and publishing content and less time worrying about the overall state of the technical quality under the covers.
So, what have I learned? Quality is more than just an idea, it is a facet of delivery that you need to stay on top of constantly to produce products and services that your consumers respond positively to. On every iteration of this website, I know take some time out to continue to improve my quality tooling. This is generally in the form of small tweaks or lightweight scripts, but if I notice a glaring issue I am more apt to put a pause on content creation to focus on developing a tool that will not only remediate the issue, but also continually validate that the issue has not come back.
What does quality have to do with SEO? Some of the key ranking factors that search engines use are specifically focused on quality. If your site is littered with spelling errors, slow pages, dead links, pages that shift their content after initial load, along with a whole host of other quality related issues you are setting yourself up for failure from the start.
▶ Key InsightWatching the /r/SEO subreddit, there is always some kind of conversation happening around why a site is being penalized by google. More often than not, some of the basic fundamentals of hosting a website are completely ignored and an irrational notion that any site with any kind of quality characteristics can rank #1 whenever the poster deems it as such. The reality is, this website started to rank better on google when I put more focus on the quality of the overall product instead of only focusing on the quality of the content.
AI is a Tool, Not The Answer
Artificial Intelligence (AI) serves as an invaluable tool in enhancing and expanding content. While it's not advisable to rely solely on AI to generate content from scratch, its real power lies in its ability to refine and develop initial ideas. Across this website, AI is extensively employed to transform basic concepts into well-structured paragraphs, comprehensive lists, and cohesive sections.
On this website, each article undergoes meticulous review to ensure consistency and accuracy. Additionally, edits are made to more effectively communicate the intended message. While AI may not be the ideal starting point for content creation, it certainly excels as a finishing touch, adding a layer of polish and depth to the final product. This approach leverages AI's strengths in augmentation and refinement, ensuring that the content not only resonates with the audience but also maintains the essence and authenticity of the original idea.
In fact, what you just read in this section was generated by AI using a few fore ideas that I provided it. It reads much better than the original points that I gave OpenAI.
AI also has a dangerous side to it in context of SEO. An article was recently posted which highlights how "easy" it is to blow a competitor away with AI generated content. You need to think of Google as a set of scales. It only knows about what it has seen before. It is calculating relative truth and quality content against subject matter competitors. If a critical mass is developed in a nice by using the power of AI to pollute that niche, Google may be slow to fight back. I am personally choosing to not engage in that kind of activity, but I know that the genie is out of the bottle and it will be extremely difficult if not impossible to stuff it back in.
▶ Key Insight
With the rapid emergence of AI in the SEO space, and because the internet is an uncontrollable force where everyone needs some sort of competitive advantage to stay on top, you need keep and "adapt or die" mentality when updating and maintaining your website. Assume that your competitors are already utilizing AI to generate their content meaning they have a competitive advantage over you.
One perspective to keep in respects to SEO is how Google feels about AI content. Google's core mission is 2 fold:
- Create the best search experience on the internet that gets users to their desired answer as quickly as possible
- Sell ads
Examining the first point, we need to realize that Google understands AI is here to stay. They can either fight it or they can "adapt or die". Google doesn't really know anything, but it can build inference of knowledge based off the huge amount of content that it scrapes and how closely the ideas in that content align with other ideas. Ranking factors and EEAT help Google understand who to trust and who not to trust when developing their inference and subsequent scoring scoring. As long as the content matches what Google is inferring is correct and high quality information, it doesn't really matter if it was AI generated or not. The end user is still getting the quality search experience that they went to Google to get.
I understand this is a very rudimentary view of scoring and ranking. I am keeping this very high level to make a point.
Now, layering on point 2, Google would not be able to sell ads if they did not get search results in front of users. Google keeps users coming back by ensuring they provide a quality search experience. With an average ad click through rate of 1.9%, there are a lot more ads displayed than are ever actually clicked. Google's hope is that by getting a user to an appropriate destination, the user will click on ads that are served by AdSense on the website itself. Additionally, Google hopes that users will go back to google for their next search because the experience was exactly what they were looking for.
AI content when purely copy-pasta'd from ChatGPT directly into a website is going to be viewed as very low quality content. ChatGPT is a predictive model that is not dreaming up brand new things, it is normally living in the constraints of content that it already has living in it's neural network. I am not saying that the content is not accurate, but it is run of the mill generalized content that does not stand out. This kind of content is not "Quality Content".
If you take your AI generated content that is accurate and properly reviewed and then spend the time adding in additional context and information that makes it stand out, you have now achieved Google's goal of providing a high quality search experience and you have a much better chance an winning in the rankings.
This Article Will Never Rank on Google
It's not supposed to rank on Google.
This article deviates from the typical pursuit of long-tail keywords or aligning with the DevOps niche. I'm not aiming for high search rankings, and I'm okay with that. Writing, for me, is more about crystallizing thoughts than driving web traffic. It may inadvertently get a few clicks, but that's not the goal.
I often see a recurring question on Reddit: "How can I make my article about X rank higher?" But that's not the right question. A better question is "How do I better understand what a search index is expecting so I can improve my page's rankings".
The common advice for better ranking is to "Write quality content!" Yet, what defines quality content is subjective. While I consider this blog post as an example of quality, it's important to remember that I'm not an algorithm processing and categorizing the vast amount of information on the web.
The key is to produce content that a search engine's algorithm finds more valuable than your competitors'. You are not just ranking in Google, you are really competing in the index against what Google already assumes is the right answer. They are in the business of answering questions, and of course selling ad space in the process.
My experience has been enlightening: much of the SEO advice online doesn't live up to its promise. Personal experimentation with various types of content and strategies is vital. Being ready to assess and interpret these results is just as important. Establish a stable baseline for your content, understand what works, and then don't hesitate to fine-tune. Observe the impact of these minor adjustments.
You'll be amazed at how much of the original question you can answer by adopting the perspective of a search index, delving into the mechanics of how content is evaluated and ranked.
Right, wrong, or indifferent; this is what I have learned. Not everybody will have the same experiences that I am having. I wouldn't take what I have written here as a series of definitive truths but I would encourage anyone reading this to incorporate what is in this article into their global perspective.
Thanks for your time and I am looking forward to continuous learning in 2024!