Active Transfer Learning using KNIME

Recently, I read this article
https://medium.com/pytorch/active-transfer-learning-with-pytorch-71ed889f08c1

Active transfer learning looked fun and do-able using KNIME too. SO had a play around with KNIME workflow and got this.

f:id:hateknime:20200301131648p:plain

Learns the probability of model being correct or not using transfer learning. Because the output was binary, instead of adding output layer, I just let the last layer retrainable. I hope I'm doing things right... If not, I can modify this easily anyway. I probably ask my colleague someday.

 

SO I let this run then it did seem to plateau at some point. I was running for 1000 times as the default setting of active learning loop was 1000 but with only 6000 or so datapoints, I didn't need to run that much too. Went to sleep, woke up, and the loop was at 968th time. haha, another experiment anyway.

 

Running this again and trying to see the best hyperparameters and everything. Another fun for today!!

ChEMBL connect locally by KNIME

Had a play around with KNIME DB this time. Downloaded sqlite ChEMBL database from ChEMBL homepage. Then connected and made this simple sampling workflow.

f:id:hateknime:20200211154942p:plain

All you need to do is unzip the ChEMBL sqlite file and set the Connector node almost default.

f:id:hateknime:20200211155019p:plain

Many nodes available too...

f:id:hateknime:20200211155042p:plain

What I like about KNIME-ing DB sqls is that you can see the output data clearly in each SQL statement. On the other hand SQL statement itself do tend to get mixed up and everything but doing it locally so I can test anything anyway.  Wait, how do I just say "show tables;"... hmmmm.....

 

But I guess there is ChEMBL DB Schema in the ChEMBL homepage too 
https://www.ebi.ac.uk/chembl/db_schema
So I guess I can just see this and figure out. WOW this db is complex!

Article

Second post of today but I though I would separte. Glancing through ChEMLB DB homepage and blogs for previous post guided me to this article.

http://chembl.blogspot.com/2020/01/new-chembl-ligand-based-target.html

which led me to...

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0325-4

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-019-0388-x

hmmm, I don't remember reading this. But looks interesting. Got another good fun readings for this week end.

Experiment

Yesterday, I wondered ChEMBL all MMP is possible or not. It sure can be done but very brute force in KNIME and my home PC??? I don't think so, but where is the bottleneck? So I tried this for fun.

f:id:hateknime:20200208132940p:plain

Downloaded SDF because I didn't even want to DB. Well, afterwards I found there is even sqlite in ChEMBL (which I'm trying to use now). But this was fun. I was surprised that upto duplicate row filter, things weren't so slow. I was even playing PC game at the same time and I think it uses some memory/CPU too (well... Monster Hunter...).

But then MMP Molecule Fragment only gradually, very slowly, went to 1%... 2%... THANKS TO VERNALIS NODES!! So I went to sleep...

This morning, I woke up and C Drive was saying ONLY HAD 98 MB LEFT!!!!! hahahaha, I think there were still about 15GB left in my C drive SSD but was all used up by KNIME temp. That was a nice stupid experiment.

 

You can move KNIME temp to D drive from the setting btw,

f:id:hateknime:20200208133529p:plain

How abou that now HA? I will give you few hundred GB now! Try this now!!!

( ゚Д゚)( ゚Д゚)( ゚Д゚)( ゚Д゚)( ゚Д゚)

 

So I'm trying this again for another fun but I will make a bit of change then. Cluster beforehand. See how this goes. I also want to make chembl db in my home pc now. There sure would be some blog post to do this... I will blog another post on further experiments/stupid-ideas. But is fun having KNIME calculating something on one screen and monster hunting on another screen.

 

New R-Group Decomposition and R-group-table like output using KNIME

One day, I checked my favourite nodes group, of course, RDKit nodes and found RDKit R-Group Decomposition was (Deprecated, sayonara, ofuko-su) and another node with same name was there.

OK, super excited because what I wanted was to show where the "cut" was made (like R1, R2 etc...)

f:id:hateknime:20200202123212p:plain

had a quick play with it and got this result

f:id:hateknime:20200202123217p:plain

 

YES! BRAVO!!!!! I can see where the cuts are!!!

 

 

OK then, I could definitely remove H's that came up in R3 and pivot them by R1 and R2 and get this!!!

f:id:hateknime:20200202123225p:plain

 

Nice... numbers are just Molecular weight but never mind. I will setup to show R-groups by activities when I am actually using this. I wonder how I should fancy-ize this table to make it cool and readable. This is so great... just great...

 

Binary classification threshold inspector by KNIME

f:id:hateknime:20200127205457p:plain

This is basically from 
https://hub.knime.com/knime/spaces/Examples/latest/03_Visualization/02_JavaScript/13_Binary_Classification_Inspector

which again was so educational, useful and fun! All I did was train by each learners (and using defautlt setting). So nothing so fancy in what I added.

 

The new node Binary Classification Inspector! I was wondering what it is but looking at the UI it is definitely cool.

f:id:hateknime:20200127205621p:plain

Yes, I think classification is about threshold, distance and your need. This allows you to see what you are doing and what your model will achieve. THIS IS SO GREAT!!. I made accuracy the highest this time but custom threshold will be fantastic. So many good nodes coming up in KNIME that I'm fiding difficult to catch up!

Active learning of chemistry in KNIME using ADMET example

f:id:hateknime:20200119002549p:plain

OK, been playing around with active loop learn for while. This page https://hub.knime.com/knime/spaces/Examples/latest/04_Analytics/12_Active_Learning/03_Active_Learning_Uncertainty_Sampling was very useful for studying for uncertainty score labeling.

I think the workflow tells it all and kept adding 10 data each round.

 

The initial results with number of rows (data) used like 300's was about 0.7 accuracy (not any bad actually...)

 

it adds up data but number of rows were around 4500 to plateau. All data were about 7000 so it does require low number of dataset for training using Active Learning. Quite cool and I can try active learning + transfer learning with this too. COOOOOOOOL.

 

I do want to try with density type labeling. but this post is useful again https://hub.knime.com/knime/spaces/Examples/latest/04_Analytics/12_Active_Learning/01_Active_Learning_PBCA_default

I will try this too soon too 

 

FIXED the bug in the workflow 2020/01/21 so the result differf from the original post