Your work load is almost the worst possible for a general purpose file system. Millions of files, frequent enumeration, lots of reads and writes. Enormous metadata I/O. With large number of files, it rarely the bandwidth of transferring the file themselves that is the problem, rather the number of IOPS to query directory entries and inodes repeatedly.
Test this workload synthetically, while monitoring the application to be sure performs acceptably. On realistic production scale storage and IOPS levels. Be sure to match the folder structure, 300 files per directory is very different from 3,000,000 files per directory. Try a couple different file systems, for Linux XFS and EXT4.
Possibly you will need very fast SSD storage and lots of RAM to make this perform adequately.
Maybe you have a support contract with your OS vendor where you can have a performance specialist look at it.
If getting acceptable performance demands it, consider application changes. Consider storing and querying the file lists from a database other than the file system. Many databases might be able to return a few million results faster than a file system constrained by POSIX in general and Linux VFS in particular.