Tag Archives: db

How I cleaned up 3M spam comments in a WordPress blog

Tl;dr: delete them.

Last week, I received an alert, my blog’s database was too large (> 3GB) and had to be upgraded to a premium plan. Paying a new monthly fee for something I don’t need made no sense, so I looked into the database.

Most of the space was used by two WP tables: wp_comments & wp_commentmeta. Their sizes being roughly 2.2 GB & 1.8 GB respectively.

Upon logging into the WP admin panel, I could see a long neglected blog. There were 3.1M comments, and somehow 2.7M were approved.

3M comments (the vast majority are assumed to be spam)

My first approach was to use akismet (Automattic’s spam filtering service) to process the comments. I went into my MySQL database and marked the comments from late 2019 (arbitrary cutoff date) from approved (comment_approved=1) to pending (comment_approved=0).

WordPress’s comments section has a handy button to Check For Spam with Akismet. I tried this, but it was taking too long. I honestly think it would take a month to check my ~3M comments. Since the browser would time out, this process had to be manually monitored (not practical).

While your comments section may have lots of legitimate comments, my understanding of my comments is that they are 99.999% spam for the past few years. This is because my blog has been largely ignored and is no longer current. In fact, searching for my blog doesn’t even show up on Google. The lack of HTTPS (now fixed) and endless spam comments probably pushed the SEO so low that *only* spammers know about the site to post comments.

With the assumption of nearly 3M comments as spam, here’s how I cleaned it up:

  1. As mentioned above, I moved all the comments from late 2019 to pending (comment_approved=0). I tried using phpMyAdmin but had to eventually use a GUI (TablePlus on Mac) to run SQL.
    UPDATE wp_comments set comment_approved = '0' WHERE comment_date > '2019-10-01' AND comment_approved = '1';
  2. Delete all my pending comments:
    DELETE FROM wp_comments WHERE comment_approved = '0';
  3. Clean up my meta comments table (wp_commentmeta):
    DELETE FROM wp_commentmeta WHERE comment_id NOT IN ( SELECT comment_id FROM wp_comments );
  4. Clean up table overhead. Overhead was making my tables still take up GBs of space even though the vast majority of comments were deleted.
    OPTIMIZE TABLE wp_comments;
    OPTIMIZE TABLE wp_commentmeta;

Using a desktop GUI to run MySQL was fast and efficient. Before and after I executed my SQL clean up commands, I checked the row counts so I could have some confidence in the executed commands.

I still have ~10K comments left on my site. Some are legitimate, some are spam. I’m letting Akismet take a pass through these 10K comments, and then I’ll clean up the rest manually.