In the cross-validation test, the entire data set was shuffled and split into em n /em folds

In the cross-validation test, the entire data set was shuffled and split into em n /em folds. In the current study, based on the properties of known drug targets, we have developed a sequence-based drug target prediction method for fast recognition of novel drug targets. Results Based on simple physicochemical properties extracted from protein sequences of known drug targets, several support vector machine models have been constructed with this study. The best model can distinguish currently known drug focuses on from non drug focuses on at an accuracy of 84%. By using this model, potential protein drug targets of human being source from Swiss-Prot were predicted, some of which have already captivated much attention as potential drug focuses on in pharmaceutical study. Conclusion We have developed a drug target prediction method based solely on protein sequence information without the knowledge of family/website annotation, or the protein 3D structure. This method can be applied in novel drug target recognition and validation, as well as genome level drug target predictions. Background Although great attempts have been exerted on drug study and development during the past decades, only about 500 drug targets have been recognized for clinically using medicines to day[1]. Recently, this quantity has been revised to be Semagacestat (LY450139) 324[2], which shows that current pharmaceutical market actually relies on only a small pool of drug focuses on, compared to the large number of proteins available in human being genome[3]. Semagacestat (LY450139) On the other hand, a significant quantity of medicines failed in the pipeline of modern drug discovery can be attributed to the wrong drug target definition at the early preclinical phases[4]. Therefore, to address fresh therapies by attacking novel drug targets or to forecast whether a protein can be potentially used as a drug target, is extremely useful in disease treatment, as well as the reduction of time and experimental costs in drug development. Drug target discovery offers received much attention in both academia and pharmaceutical market. Many efforts have been made to estimate the total quantity of drug focuses on[1,2,5-8] and several drug target related databases such Rabbit Polyclonal to MTLR as TTD (restorative drug target database)[9], DrugBank[10], have been also established. According to the existing knowledge, classical restorative drug focuses on fell into approximately 130 protein family members[2,6], which generally include enzymes, G-protein-coupled receptors, ion channels and transporters, and nuclear hormone receptors, etc[1,6]. Many organizations possess attempted to develop experimental and computational tools to find fresh potential drug focuses on[5,6,11-16]. Several strategies have been used in drug target prediction, which can be generally divided into two organizations. The 1st group is to analyze the known restorative drug focuses on from genome level based on sequence homology or website containing method [5,6], which requires protein families into account to find potential novel drug target family members. In fact, not all proteins in the same family can be used as drug targets. The additional one is to search for binding pockets within the protein surface based on protein 3D constructions, and to determine those that may bind to drug-like compounds with sensible affinities[11,13]. Theoretically, this kind of methods is limited to the Semagacestat (LY450139) availability of 3D constructions and cannot be applied to genome scale. Recently, Han et al. [16] used machine learning methods to build a model with 1,484 medical and research drug focuses on from TTD database[9], and expected druggable proteins among different organisms. Clearly, the quality of drug target data restricts the predictive power of models. Unfortunately, several versions of drug target lists have been proposed[1,2,5-8]. Consequently, we have to establish a crucial criterion to select valid drug focuses on for the prediction. The possible reasons for many versions of drug targets are: the definition of drug target is hard and also arbitrary[7]; it is hard to assign Semagacestat (LY450139) each drug to its target due to poorly recognized pharmacology, limited selectivity against related proteins and some targets are.