Tag Archives: spam

How I cleaned up 3M spam comments in a WordPress blog

Tl;dr: delete them.

Last week, I received an alert, my blog’s database was too large (> 3GB) and had to be upgraded to a premium plan. Paying a new monthly fee for something I don’t need made no sense, so I looked into the database.

Most of the space was used by two WP tables: wp_comments & wp_commentmeta. Their sizes being roughly 2.2 GB & 1.8 GB respectively.

Upon logging into the WP admin panel, I could see a long neglected blog. There were 3.1M comments, and somehow 2.7M were approved.

3M comments (the vast majority are assumed to be spam)

My first approach was to use akismet (Automattic’s spam filtering service) to process the comments. I went into my MySQL database and marked the comments from late 2019 (arbitrary cutoff date) from approved (comment_approved=1) to pending (comment_approved=0).

WordPress’s comments section has a handy button to Check For Spam with Akismet. I tried this, but it was taking too long. I honestly think it would take a month to check my ~3M comments. Since the browser would time out, this process had to be manually monitored (not practical).

While your comments section may have lots of legitimate comments, my understanding of my comments is that they are 99.999% spam for the past few years. This is because my blog has been largely ignored and is no longer current. In fact, searching for my blog doesn’t even show up on Google. The lack of HTTPS (now fixed) and endless spam comments probably pushed the SEO so low that *only* spammers know about the site to post comments.

With the assumption of nearly 3M comments as spam, here’s how I cleaned it up:

  1. As mentioned above, I moved all the comments from late 2019 to pending (comment_approved=0). I tried using phpMyAdmin but had to eventually use a GUI (TablePlus on Mac) to run SQL.
    UPDATE wp_comments set comment_approved = '0' WHERE comment_date > '2019-10-01' AND comment_approved = '1';
  2. Delete all my pending comments:
    DELETE FROM wp_comments WHERE comment_approved = '0';
  3. Clean up my meta comments table (wp_commentmeta):
    DELETE FROM wp_commentmeta WHERE comment_id NOT IN ( SELECT comment_id FROM wp_comments );
  4. Clean up table overhead. Overhead was making my tables still take up GBs of space even though the vast majority of comments were deleted.
    OPTIMIZE TABLE wp_comments;
    OPTIMIZE TABLE wp_commentmeta;

Using a desktop GUI to run MySQL was fast and efficient. Before and after I executed my SQL clean up commands, I checked the row counts so I could have some confidence in the executed commands.

I still have ~10K comments left on my site. Some are legitimate, some are spam. I’m letting Akismet take a pass through these 10K comments, and then I’ll clean up the rest manually.

Housekeeping

As I’m settling into NY, I find that I’m eating out less as a tourist with my DSLR. I’m eating in more by buying groceries to save money.  Getting groceries is a very different beast in NY than elsewhere. In this mega-urban setting, it’s popular to order groceries online or pay for delivery service at the store. If I buy something, I have to physically carry it back – as in walk it all the way back to my residence. Contrast this to anywhere else where you simply load up loads of food in your car. This makes getting milk, juice, or enough food to last a week annoying.

There is also a distinct lack of generic supermarkets. As in Safeway, Lucky’s, Costco, etc. Real estate is simply too expensive to find large supermarket chains with even larger parking lots in front. There are Whole Foods and Trader Joe’s scattered around the city, but they have their weaknesses. There are also small mom & pop corner shops, but the selection is not as convenient as a supermarket. Have you ever waited an hour in line at Trader Joe’s? Even Whole Foods can easily have a half hour line zig-zagging throughout the store. The other downside to not having generic supermarkets (such as Safeway) is that you can’t get cheap nationwide brand name staples like cereal, snacks, frozen foods, etc. Trader Joe’s and Whole Foods carry their own private labeled food for good or for worse.

All this above is my way to say that I haven’t been posting much on this blog. With the weather getting nicer, I should be out more and able to take more pictures. This is a good thing.

Note that I’ve disabled comments for this blog. For the time being, I don’t need hundreds of dead-obvious link spam comments piling up each hour. And to think, this isn’t what I would consider a trafficked blog.