I've collected a large number of "Web Shell by oRb" (a.k.a. "FilesMan" backdoor, a.k.a. antichat backdoor) files by running a WordPress honeypot, and searching pastebin. The code in the variants is obviously related. I'd like to figure out a "phylogeny" of WSO variants. I found D. Gusfield, Efficient algorithms for inferring evolutionary trees, which has a clearer explanation of the phylogeny generating algorithm in Caroline Uhler's Finding a Perfect Phylogeny.
Unfortunately, the folks who created some of the WSO variants borrowed code from other variants, so a "perfect" phylogeny doesn't exist - some variants have two or more immediate ancestors due to the malware writer equivalent of "horizontal gene transfer" as practiced by real life bacteria. The only thing I've been able to find about phylogenies that are directed acyclic graphs, is Constructing Computer Virus Phylogenies. The problem with that paper is that it doesn't seem to actually show you an algorithm - either no algorithm is present, or it's buried in the assorted lemmas, theorems, definitions and proofs.
Are there other papers that actually display an algorithm or pseudocode for an algorithm that can generate "phyloDAGs"? I'd settle for an implementation in a major programming language, even.