Here we present a model that is consistent with the experimental data

The DPM mining Ponatinib framework was originally proposed in the data mining community to efficiently enumerate combinations of variables and identify those that are highly predictive . DPM builds upon a general INCB28060 search strategy called Apriori , which leverages the anti-monotonicity of a special type of objective functions for efficient enumeration of high-order variable combinations . Conceptually, with an objective function that is anti-monotonic, a SNP combination satisfies a threshold on the objective function only if all its subsets satisfies the threshold. In another word, if a combination does not pass a threshold on the objective function, all of its supersets can be pruned in the search space and it is guaranteed that no larger combination that satisfies the threshold would be missed. This is the key difference between Apriori-based combinatorial search and brute-force combinatorial search. In this study, we leverage a recently developed anti-monotonic objective function SupMaxPair and use it in the Apriori framework to efficiently search for SNP combinations that are discriminative between cases and controls. SupMaxPair captures the association between a SNP combination and a binary disease phenotype , i.e. the higher SupMaxPairo, the stronger the SNP combination is associated with the phenotype. The Apriori framework using SupMaxPair as the objective function is called SMP and has the advantage of handling dense and high dimensional data, which addresses the key challenge in discovering high-order combinations from SNP datasets, i.e. a fixed high density of 33% as a result of the binary encoding of each SNP . Thus, one third of the matrix values are 19s .) and a large number of SNPs . This advantage owes to the effective use of phenotype information in the searching process and is the essence of SMP��s better efficeincy and scalablity over other DPM algorithms. It is worth noting that Ma et al. is the first that leverages an Apriori-based algorithm for the efficient enumeration of SNP combinations. However, FPC does not make use of phenotype information to optimize the search process and thus is much less efficient and less scalable than SMP, as has been shown in on differential gene expression analysis and will also be demonstrated on SNP datasets in the result section of this study. We compare the DPM framework with two representative existing tools for high-order SNP combination discovery: MDR and the framework presented in . For MDR, we used the Java version and used the standard coding, in which each SNP is represented by a categorical value with three possible values . For DPM and FPC, we use the binary coding. FPC requires an input for the parameter minsup . For comparison purpose, we set a five-hour maximal runtime allowance for all the three techniques.

Leave a Reply Cancel reply