Blog fixes for Google

I got an email from Google the other day about indexing issues… normally I ignore these, because I really don’t give a shit about traffic (I do not keep logs, analytics, etc, I literally don’t care)… my audience is myself, because if I don’t write something down I will absolutely forget it happened as I have the attention span of a demented goldfish. But then I remembered that sometimes when I can’t find anything, I’ll use the Google-powered search bar, and if it’s failing to index stuff then that might be bad for me, so I took a look.

I somehow ended up looking at the LightHouse/PageSpeed Insights/whatever the heck it’s called today scores, and found a few things I can improve there. Some of it is legitimate things I didn’t care about but now do (contrast ratio for instance), and much of it was fairly simple to fix.

Some of it was less-so: it’s complaining about the time before the largest contentful paint or whatever, it reckons ~2 seconds to render the front page. Considering I work on a 3rd-gen i7 that’s very tired, I kinda call bullshit, but let’s have a look. It wants me to load my ~350KB of CSS asynchronously so the page can load without it. I guess! Copy+paste some code and it works, but I do not like it. First of all the page really does load without CSS, so it flashes white briefly before the CSS is applied, and it does this on every page load even if the CSS is already present in the cache!

The really annoying part is it stopped complaining about the LCP, but instead complained about elements moving around after the first draw… no shit, you had me remove my CSS from the first draw’s critical path! I briefly considered that I could probably split up some parts of the CSS, block on loading the critical bits and the fancy, shiny bits could be loaded async after the fact, but then I thought fuck it, a 95 or so is good enough. Stick your red triangle up your arse, my site works fine in links for fuck’s sake, and the only thing stopping it from working on a much older browser is the fact that Netlify refuse to allow HTTP connections (HTTPS is forced, always).

Anyway, back to the indexing… it looks like a lot of it is caused by canonical URLs being fucky. There’s a couple specific issues here, first of all for some reason, probably from way back when I had this site on Blogger, some of the pages are linked with query strings, and Netlify treats them all the same. I think I can declare a canonical URL with a <link> tag in the top, so let’s do that. Easy.

Next up, all these goddamn redirects from missing trailing slashes. You see, when I had this site on Blogger, and then WordPress, and then Pelican hosted on AWS S3, I could have a url like /2024/02/cool-blog-post and everything was happy. Netlify + Hugo (someone, not sure who is to blame) doesn’t like this, so every post is actually a directory with an index.html inside. When you link directly to /cool-blog-post it 301 redirects to /cool-blog-post/.

That’s probably easy enough to fix the vast majority:

grep -Ern '\(/[0-9]{4}/[0-9]{1,2}/[^/]+\)' . | cut -d ':' -f 1 | sort -u | xargs sed -Ei 's#(\(/[0-9]{4}/[0-9]{1,2}/[^/ \)]+)\)#\1/)#g'

The resulting git commit is pretty monster, but I don’t think I broke anything. To be sure I dusted off my old webspyder project and made a few changes to make it work on Python3 (which I probably should commit, but anyone with any sense would just use scrapy instead) and changed it to report 301s as a fault too, and found a few more.

Will this make Google like me more? Not sure, let’s wait and see.

Oh, while I was at it, I did find by using my spider, pulling the URLs visited out of it, and comparing those with a list of directories in public/ after a hugo -D that there was a small handful of pages that were not linked from anywhere, so I changed that too. I don’t think I’m missing any now.

Author:

fwaggle

Published:

about a year ago

Modified:

about a year ago

Filed under:

meta

Location:

Horsham, VIC, Australia

Navigation: Older Entry Newer Entry