{"id":2410,"date":"2022-10-27T04:45:39","date_gmt":"2022-10-27T08:45:39","guid":{"rendered":"https:\/\/www.rexfeng.com\/blog\/?p=2410"},"modified":"2022-10-27T04:45:39","modified_gmt":"2022-10-27T08:45:39","slug":"how-i-cleaned-up-3m-spam-comments-in-a-wordpress-blog","status":"publish","type":"post","link":"https:\/\/www.rexfeng.com\/blog\/2022\/10\/how-i-cleaned-up-3m-spam-comments-in-a-wordpress-blog\/","title":{"rendered":"How I cleaned up 3M spam comments in a WordPress blog"},"content":{"rendered":"\n<p><em>Tl;dr: delete them.<\/em><\/p>\n\n\n\n<p>Last week, I received an alert, my blog&#8217;s database was too large (> 3GB) and had to be upgraded to a premium plan. Paying a new monthly fee for something I don&#8217;t need made no sense, so I looked into the database.<\/p>\n\n\n\n<p>Most of the space was used by two WP tables: wp_comments &amp; wp_commentmeta. Their sizes being roughly 2.2 GB &amp; 1.8 GB respectively.<\/p>\n\n\n\n<p>Upon logging into the WP admin panel, I could see a long neglected blog. There were 3.1M comments, and somehow 2.7M were approved.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"540\" height=\"69\" src=\"https:\/\/www.rexfeng.com\/blog\/wp-content\/uploads\/2022\/10\/comments-540x69.png\" alt=\"\" class=\"wp-image-2411\" srcset=\"https:\/\/www.rexfeng.com\/blog\/wp-content\/uploads\/2022\/10\/comments-540x69.png 540w, https:\/\/www.rexfeng.com\/blog\/wp-content\/uploads\/2022\/10\/comments-300x38.png 300w, https:\/\/www.rexfeng.com\/blog\/wp-content\/uploads\/2022\/10\/comments-150x19.png 150w, https:\/\/www.rexfeng.com\/blog\/wp-content\/uploads\/2022\/10\/comments-624x80.png 624w, https:\/\/www.rexfeng.com\/blog\/wp-content\/uploads\/2022\/10\/comments.png 671w\" sizes=\"auto, (max-width: 540px) 100vw, 540px\" \/><figcaption>3M comments (the vast majority are assumed to be spam)<\/figcaption><\/figure>\n\n\n\n<p>My first approach was to use akismet (Automattic&#8217;s spam filtering service) to process the comments. I went into my MySQL database and marked the comments from late 2019 (arbitrary cutoff date) from approved (comment_approved=1) to pending (comment_approved=0). <\/p>\n\n\n\n<p>WordPress&#8217;s comments section has a handy button to <strong>Check For Spam<\/strong> with Akismet. I tried this, but it was taking too long. I honestly think it would take a month to check my ~3M comments. Since the browser would time out, this process had to be manually monitored (not practical).<\/p>\n\n\n\n<p>While your comments section may have lots of legitimate comments, my understanding of my comments is that they are 99.999% spam for the past few years. This is because my blog has been largely ignored and is no longer current. In fact, searching for my blog doesn&#8217;t even show up on Google. The lack of HTTPS (now fixed) and endless spam comments probably pushed the SEO so low that *only* spammers know about the site to post comments.<\/p>\n\n\n\n<p>With the assumption of nearly 3M comments as spam, here&#8217;s how I cleaned it up:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>As mentioned above, I moved all the comments from late 2019 to pending (comment_approved=0). I tried using phpMyAdmin but had to eventually use a GUI (TablePlus on Mac) to run SQL.<br><code>UPDATE wp_comments set comment_approved = '0' WHERE comment_date > '2019-10-01' AND comment_approved = '1';<\/code><\/li><li>Delete all my pending comments:<br><code>     DELETE FROM wp_comments WHERE comment_approved = '0';<\/code><\/li><li>Clean up my meta comments table (wp_commentmeta):<br><code>     DELETE FROM wp_commentmeta WHERE comment_id NOT IN ( SELECT comment_id FROM wp_comments );<\/code><\/li><li>Clean up table overhead. Overhead was making my tables still take up GBs of space even though the vast majority of comments were deleted.<br><code>     OPTIMIZE TABLE wp_comments;<br>     OPTIMIZE TABLE wp_commentmeta;<\/code><\/li><\/ol>\n\n\n\n<p>Using a desktop GUI to run MySQL was fast and efficient. Before and after I executed my SQL clean up commands, I checked the row counts so I could have some confidence in the executed commands.<\/p>\n\n\n\n<p>I still have ~10K comments left on my site. Some are legitimate, some are spam. I&#8217;m letting Akismet take a pass through these 10K comments, and then I&#8217;ll clean up the rest manually. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Tl;dr: delete them. Last week, I received an alert, my blog&#8217;s database was too large (> 3GB) and had to be upgraded to a premium plan. Paying a new monthly fee for something I don&#8217;t need made no sense, so I looked into the database. Most of the space was used by two WP tables: [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[1492,1106,1103,152,1491],"class_list":["post-2410","post","type-post","status-publish","format-standard","hentry","category-tech","tag-akismet","tag-db","tag-mysql","tag-spam","tag-wordpress"],"_links":{"self":[{"href":"https:\/\/www.rexfeng.com\/blog\/wp-json\/wp\/v2\/posts\/2410","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rexfeng.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rexfeng.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rexfeng.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rexfeng.com\/blog\/wp-json\/wp\/v2\/comments?post=2410"}],"version-history":[{"count":9,"href":"https:\/\/www.rexfeng.com\/blog\/wp-json\/wp\/v2\/posts\/2410\/revisions"}],"predecessor-version":[{"id":2420,"href":"https:\/\/www.rexfeng.com\/blog\/wp-json\/wp\/v2\/posts\/2410\/revisions\/2420"}],"wp:attachment":[{"href":"https:\/\/www.rexfeng.com\/blog\/wp-json\/wp\/v2\/media?parent=2410"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rexfeng.com\/blog\/wp-json\/wp\/v2\/categories?post=2410"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rexfeng.com\/blog\/wp-json\/wp\/v2\/tags?post=2410"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}