Cluster Analysis by Fingerprint (ECFP)
What's the BEST method to interpret molecules so that computer can understand. I know this sentence itself is somewhat... BUT this is what I would like to do. Isn't this what we all want to do!?
Well, so far in 2019, April, I would say ECFP would be the one (Yes, I accept any comments. I would want to comment to this myself even).
BUT, let's cluster by ECFP coz its fun!
So, what did I do...
File Reader: Reads csv
Column Filter: Delete unnecessary columns
RDKit Fingerprint: I love RDKit, its the best! Morgan FP by default.
Expand Bit Vector: This, I dunno If I need this, there's gotta be a smarter way
k-Means: I just like this to start off with (clustering to 10 groups)
It's all cool and so easy by KNIME. then copy and pasted into previous workflow where I clustered by descriptors.
SO! what's the difference between Desriptors to PCA and Fingerprint
Ok, quite different. So I guess how you express molecules would really change the result. Well, not so far off though. This is interesting.
For a machine learning I usually start off by ECFP and SVM or RF. Maybe I'll try that for another fun. I can do all these without any programming/scripting. KNIME is so cool .