Getting sample datasets from DeepChem/PDBbind to play around from home

Ok, stay home, play around from home...
NO DATASET TO TRAIN!!! AHHHHH!!!!

Then comes the nice deepchem dataset with variety of sample dataset. Or there is Binding DB and so on.
https://github.com/deepchem/deepchem#citing-deepchem
http://www.bindingdb.org/bind/index.jsp

I like how it (DeepChem) has PDBbind (not the latest, I think),
http://www.pdbbind-cn.org/
and wanted extract data from certain type of protein. So here is the workflow.

f:id:hateknime:20200421135538p:plain

Also like how fixed width file reader can specify the length of width for each column.
In the top middle part about value counter and sorter, I looked at what kind of protein has many data. Beta secretase 1 seems to have many so I extracted PDB by copying the HIV pdb from the whole list of pdb dataset (in the middle loopind section). String manipulation is the part where I made file URL to get the files like this!

f:id:hateknime:20200421135641p:plain

Then obtained their Kd values at the bottom row of the workflow.

f:id:hateknime:20200421140151p:plain

sweet dataset! Machine learning of this dataset maybe fun. Also wondered how to treat Kd, IC50, Ki and so on so I googled and found the same question.
https://github.com/deepchem/deepchem/issues/715

Treat them the same to start off with but use them with caution I guess.