4

I run a clamav scan weekly on my servers. There is one server with a raid6 cluster of 30TB of disk space where the scan take more than 24h to run.

So I wonder how can I run clamscan on the whole filesystem, taking advantage of the several cores the server has? The server has good i/o capacities and I would like the scan to go as fast as the hardware can go.

I know about the --multiscan parameter of clamdscan. The main issue I have with clamdscan is that it cannot process files that the clamav user cannot access, and it seems discouraged to run the daemon as root.

I saw some people are using parallel to achieve this but I could not find a clean command that would really scan the whole filesystem.

azmeuk
  • 165
  • 1
  • 14
  • 4
    What is the limiting factor? Actually being able to scan 30TB per day means the disk array is delivering 364 MB/sec to the scanner - are you sure it is able to deliver sustantially more I/O performance to begin with? – Sven Aug 01 '18 at 07:49
  • 3
    If you really have more I/O than 364MB/s, why don't you use [`clamdscan` with the `-m`](https://linux.die.net/man/1/clamdscan) option? – Lenniey Aug 01 '18 at 08:15
  • I understand that the clamav daemon do not run as root but as the user clamav by default on most linux distributions. To scan the whole filesystem I need the scanning program to be able to run as root. Most of the doc I found advise never run the daemon as root. What do you think? – azmeuk Aug 01 '18 at 13:18
  • I'am a linux newbie there, but can you map the /home from another server and start the scan from there ? – yagmoth555 Aug 20 '18 at 13:00

2 Answers2

4

You've got two separate questions:

  1. Parallelize clamdscan - apart from combining --multiscan and --fdscan there's little you can do. Alternatively, you can run multiple instances of clamscan on separate folders independently from the daemon.
  2. Scan files that clamd can't access - this isn't possible. clamd requires at least read access to any files that you want to scan and report, and write access to any files you want to scan and clean. I'd run the daemon with read access only and handle the reports manually. If you don't trust ClamAV to be able to handle malicious files you should use another scanner.
Zac67
  • 8,639
  • 2
  • 10
  • 28
  • So if a malicious software want to hide a virus, it just has to remove the read access from `clamd`? Isn't it a huge security hole? – azmeuk Sep 05 '18 at 09:36
  • `--fdscan` allows a lot more files to scan though. – azmeuk Sep 05 '18 at 09:47
  • If some malicious software manipulates the system you've lost. But that wasn't the question. – Zac67 Sep 05 '18 at 09:57
0
  1. Best way would be to run multiple instances of the clamsdcan, ensuring that all daemons are have affinity to different cores and all of them use different physical devices (i.e. disks) and even better - separate bus channels. The I/O would be the bottleneck in this task.
  2. Ensure you're scanning what you're really need. Scanning archives or disk images would be CPU, I/O and RAM hungry, because the process should read (i/o), unpack (CPU, RAM to map files, I/O to write cache) and scan them after. Might be good idea to exclude ISO files, MKV files, jpgs.
  3. You might wand consider scanning only recently changed files, because scanning big ISO that no one changes every scan.
  • Why would I need to not scan a file? If I am searching for malicious files, it seems interesting to scan absolutely everything I can, whatever resources I takes, don't you think? Viruses can hide in iso files, so why would I exclude them from my scan? – azmeuk Aug 21 '18 at 09:09
  • 1
    You need to be performance-wise and address your threat model. After 10 years in AV vendor and other years in infosec, I would tell that scanning file storages daily is mostly useless waste of resources so scanning executables are enough. If you really need to scan your files storages every day with repacking all archives you have bigger problems. – Andrey Bondarenko Sep 04 '18 at 13:27
  • 'whatever resources I takes' is fundamentally flawed. You always have limited resources available, always and so you need to make usability - security trade-offs. – Miloš Đakonović Jan 23 '19 at 13:56