I'm writing an antivirus in python mostly to learn and for research purposes, I do understand it would be more efficient to do this in something like C and eventually I will port this over. So far I have coded the first part of the AV that will check virus share and download the latest hashes to a file.
From here I'm not sure how to have the hashes compared to a database so I can see the malware family it's part of. Is there a resource online or some API I can use? I would try VirusTotal, but since it's a free account I have it can only do 4 requests per minute.
Lastly, does the AV need to hash all legitimate files on the system when scanning then compare that to my list of malicious signatures? I plan to build upon this and eventually use ML but for now I want to keep it as simple as possible and learn while I go along.