Effect of mutations on mRNA
Effect of mutations on mRNA network
Mutation data and transcriptional network
iPAC [17]
Copy number alterations
GE and mutation data
MDPFinder [16]
Mutual exclusivity of gene modules
GE and mutation data
MeMo [10]
Genes correlation
Mutation data and network
MSEA [11]
Combination of data associated to
Pathways and networks
disease
MutsigCV [15]
Frequency of mutations and spectrum
GE and exome sequence
NetBox [14]
Functional modules in cellular networks
Mutation data and network
OncodriveCLUST [8]
Somatic mutation clustering
Mutation data
OncodriveFM [7]
Functional mutations impact on gene
Somatic variants data
Simon [6]
Mutations impact protein function
Mutation data
In the next step, we used iMaxDriver for predicting CDGs, and then, for iMaxDriver and the other fifteen methods, we assessed the accuracy of the predicted CDGs by comparing each list with the Cancer Gene Census (CGC) [29] gene list, as the gold standard (available from https://cancer.sanger.ac.uk/census). Next, the IM approach was independently applied on each of the three GE datasets (see Subsection 3.1) to compute the influence of each TF in the network. The results of iMaxDriver are provided for each cancer type as a list of potential CDGs sorted by their influence (coverage count) in descending order. Subsequently, by discretizing the results based on a threshold value
ACCEPTED MANUSCRIPT
we classified the Brefeldin A either as CDG or non-CDG. For fine-tuning the threshold value used for binary classification, we exploited pROC [30] package in R.
The F-measure is a prevalent measure for evaluating the classifiers and is a good measure considering both of precision and recall measures. F-measure is mean of precision and recall in harmonic manner and defined as the following:
where the precision is defined as the following:
and recall is defined as the following:
In the above equations, TP stands for the number of true positive, FP stands for the number of false positive and FN stands for number of false negative items. We will use F- measure as classification quality measure for evaluation of the iMaxDriver.
3. Results
We weighted the modified TRN using the GE data of three cancer types, including breast invasive carcinoma (BRCA), lung squamous cell carcinoma (LUSC) and colon adenocarcinoma (COAD) independently. Then, the list
of predicted CDGs was generated using iMaxDriver (Supplementary Datasets S1 and S2). The iMaxDriverW could
find 103, 143 and 113 CDGs for BRCA, LUSC and COAD, respectively. Subsequently, the iMaxDriverN can find 88 driver genes in each of BRCA and LUSC tissues and 90 driver genes in COAD. We evaluated our method and the other fifteen methods using cancer gene census (CGC) and functionally validated driver genes provided in [31] by Kumar et al. and gathered the results for BRCA, LUSC and COAD in Table 3. In each of the tissue types and validation datasets, top three methods with the best prediction results is shown bold. In all of the tissue types, the
iMaxDriverw is one of the top three methods with the best results. Moreover, the iMaxDriverN in BRCA and LUSC tissue types is one of the top three methods.
ACCEPTED MANUSCRIPT
Table 3 The evaluation of the iMaxDriver and the other methods using CGC and Kumar datasets
BRCA
LUSC
COAD
Number of
Fraction of
Number of
Fraction of
Number of
Fraction of
Method Name
predicted
predicted
predicted
predicted drivers
predicted drivers
predicted drivers
drivers
drivers
drivers
Validation Dataset
Kumar
CGC
Kumar
CGC
Kumar
CGC
Kumar
CGC
Kumar
CGC
Kumar
CGC
Most CDG finding tools predict only a limited number of genes. Although some of these tools predict many CDGs in their output, they are not generally of an acceptable precision. As an example, for BRCA tissue, iPAC and MSEA predict 4821 and 855 genes as CDGs, while their precision is as small as 5.1% and 8.8%, respectively. In contrast,
iMaxDriverW predicts 408 genes as CDGs, with a precision value of 33.3%. The binary matrix representation of the genes predicted as CDG by the methods and dendrogram of clustering result are shown for BRCA, LUSC and COAD in Fig. 4.
ACCEPTED MANUSCRIPT
Figure 4 Binary matrix representation of the genes predicted as CDG by the methods.
The bar plot comparing F-measure values of the methods is shown for BRCA, LUSC and COAD in Fig. 5.
Figure 5 The F-measure of iMaxDriver and other fifteen computational methods proposed for CDG prediction
ACCEPTED MANUSCRIPT
Furthermore, by comparing the list of classified genes, we can see that more than 38% of the genes classified as