There is no magic solution to count table contents especially if you have a client that has a table with 40+ million records and you need to do the filtering using the ‘OR‘ SQL statement.
Originally this article was published on codeboost.com domain.
The original query was the following:
SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2') OR field2 IN ('val3','val4');
The first benchmark before doing optimization showed me results after 4 minutes.
My first step was to create a compound key (multi-column key) using the field1 and field2. I created it as following:
ALTER TABLE table1 ADD INDEX `field1_field2_idx` (`field1`,`field2`);
After waiting for 3 hours (you remember I had a table with 40 million records) I was able to continue.
This new compound key does not give me any performance increase because the OR statement was used in the original query.
As some of you know “ORs are notoriously bad performers because it splinters the execution path” (from http://stackoverflow.com/questions/6551682/mysql-indexing-in-an-or-statement).
After some thinking, I decided to get rid of the OR and create 2 queries instead.
The following 2 queries gave me the same result ( I had the sum up the results):
SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2') AND field2 NOT IN ('val3','val4'); SELECT COUNT(*) from table1 WHERE field2 IN ('val3','val4');
Now the results were ready after 2 minutes. The first query was almost instant because it used the compound key and the second query took all the time. It was because the index was missing for field2.
So, as a final step, I had to add another index on field2. I used the following command:
ALTER TABLE table1 ADD INDEX `field2_idx` (`field2`);
I had to wait another 3 hours for the index to be ready and finally:
JUST 5 SECS to count rows on 40+ million records’ table result was achieved.
Hope you will find this information useful in your case.
In the original article Vladimir Sotirov provided another solution:
you can use the First index only if you just run the SQL
SELECT COUNT(*) from table1 WHERE field1 IN ('val1','val2');
SELECT COUNT(*) from table1 WHERE field2 IN ('val3','val4')
AND field1 NOT IN ('val1','val2');
That will save you having both indexes, but may give you slower performance.
About the author
In my daily life, I am the founder of a GDPR privacy automation service available at https://privacybunker.io/. I maintain the following open-source privacy project https://databunker.org/.
Among my credits, I was a founder of a database security company GreenSQL/Hexatier which was acquired by Huawei.
Specialties: Software and cloud architecture, Compliance (GDPR, HIPAA, PCI, SOX), blockchain technologies, software development, secure architectures, project management, and low-level research.