Conclusions and Future Work

Two different flavor of observables have been tested and compared with the experimental measurement of partition coefficients. Linear regression and its variants, as well as deep neural network showed comparable performance, taking account into the accuracy and flexibility trade off. The partition coefficient overally implied a linearly dependence to each of the variables. The understanding is that each of the identified fragment contributes to the property on the same order of magnitude and none shall be neglected.

Toxicity property prediction was also test with logistic regression using the same set of chemical fingerprints as observables. High accuracy was achieved and shall lead to significant decrease in the investment in the pre-clinical trail stage of drug development.

Future work on toxicity estimation may incorporate more long-term side effect data from clinical trail, since it was mostly acute toxicity data that we have been working on. Given that clinical trails cost over two to three times more than that from pre-clinical trails, further reduction of total cost shall be achievable. Also, while the Bayes classifier uses a threshold of 50% for the posterior probability of default, one can always play around with the threshold to decrease the false positive or false negative. As far as the return on investionment is concerned and given the enornemous computational power, a good practical would be increasing the threshold for the "benign" class. With that, the screening will be more strict but will go over more compounds in library to maintain the same final candidate compound number.