prospr: I cannot run plmDCA

I could not run plmDCA with the latest docker environments. It kept crashing. I tried to resolve the issue, but I could not.

Following is the error message. [2019-11-12 15:06:28.109831] potts running. inserting gaps... Reformatted /data/1ubqA/1ubqA.a3m with 2018 sequences from a3m to a2m and written to file /data/1ubqA/1ubqA.a2m terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc /opt/entrypoint.sh: line 2: 6 Aborted (core dumped) /usr/bin/python3 /opt/prospr.py $@

And, I dug into the issue, and I found that hhblits.py could not import the plmDCA_asymmetric library with the following message. Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/opt/prospr/hhblits.py", line 90, in run p = plmDCA_asymmetric.initialize() File "/usr/local/lib/python3.6/site-packages/plmDCA_asymmetric/__init__.py", line 305, in initialize return _pir.initialize_package() File "/usr/local/lib/python3.6/site-packages/plmDCA_asymmetric/__init__.py", line 253, in initialize_package package_handle.initialize() File "/usr/local/MATLAB/MATLAB_Runtime/v94/toolbox/compiler_sdk/pysdk_py/matlab_pysdk/runtime/deployablepackage.py", line 33, in initialize mcr_handle = self.__cppext_handle.startMatlabRuntimeInstance(self.__ctf_path) SystemError: CTF file '/usr/local/lib/python3.6/site-packages/plmDCA_asymmetric/plmDCA_asymmetric.ctf' failed to open for 'Read' access. Error message: 'Could not open source package'

About this issue

Most upvoted comments

https://files.physics.byu.edu/data/prospr/potts-code/ This has the whole python package and installer. You’ll need Matlab 2018a or Matlab redistributable (free) 2018a for it to work

I would like to run prospr outside of docker, but failed with plmDCA_asymmetric. I used the .ctf file from https://files.physics.byu.edu/data/prospr/potts-code/ and get the following error message:

Index in position 3 exceeds array bounds.
Error in plmDCA_asymmetric (line 153)

Is this precompiled plmDCA version not patched yet? @YoshitakaMo seems to have found a solution. Unfortunately, I do not have a Matlab compiler, so I cannot apply the suggested changes. It would be great if you could share the patched .ctf file

From what I remember, I compiled plmDCA asymmetric following the instructions in the readme at this repository: https://github.com/Urinx/alphafold_pytorch.

However, that repository does not use the modified version of plmDCA that this one does. All of the required modifications are described in src/potts.patch. That patch is unfortunately missing some modifications which results in a non-functional plmDCA file. The solution for me was using the code mentioned above in this issue by YoshitakaMo on Jan 9, 2020.

The instructions from alphafold_pytorch also fortunately use octave instead of MATLAB, so it can be done for free and without any proprietary code. It worked well for me.

I might be forgetting something so please respond if there is still something missing. I’m making a fork of this repository which will hopefully be more functional and well documented.

Hi, I’m also very interested in this excellent work, prospr. I have started to learn how to predict distance matrices by analyzing the provided codes. However, as discussed below, the plmDCA doesn’t work in my CentOS.

I notice the patch for the plmDCA_asymmetric.m.

I recognized that you used a modified version of plmDCA_asymmetric (https://github.com/magnusekeberg/plmDCA).

  • Added an amino acid type X as 22 (This is a guess)
  • The original one simply returns upper triangle components of the final L x L score matrix. However, I found that we need the whole information from the Potts model such as J, h, Frobenius norm, and its score.

I modified the plmDCA_asymmetric file to save in MATLAB mat file format with the information. And then, I compiled it into an executable file on the Linux system, rather than packaging it into a Python library. I replaced your version with mine and modified hhblits.py code accordingly.

After 3 days of hard working on it, I could get one prediction!

Fortunately, I have a license of MATLAB to compile the plmDCA_asymmetric_v2 with the patch potts.patch tyggna provided. However, the compiled binary still doesn’t work with an error message Index in position 3 exceeds array bounds. when I executed the following command ./run_plmDCA_asymmetric.sh /path/to/MATLAB_Runtime/v94 /path/to/plmDCA_asymmetric_v2/test_examples/example1_alignment.txt example.mat 0.2 8.

I think this error is caused by following lines

%Take J_ij as the average of the estimates from g_i and g_j.
    Jflat=0.5*(J1+J2);

    J=zeros(N,N,q,q);
    l=1;
    for i=1:(N-1)
        for j=(i+1):N
            J(i,j,:,:)=Jflat(:,:,l);
            J(j,i,:,:)=Jflat(:,:,l)';
            l=l+1;
        end
    end

If the lines are substituted by the original code, J=0.5*(J1+J2);, the whole process (python3.6 prospr.py build 2o6pA for 2o6pA.fasta) seemed successfully finished, but the resulting pkl file could not be used with the subsequent run process.

$ python3.6 prospr.py run 2o6pA 

Making predictions for 2o6pA using network 1 with stride 25
Sequence length is 81
Processing 16 total crops...
         please note this will take longer because not using a(ny) GPU(s)!
Traceback (most recent call last):
  File "prospr.py", line 135, in <module>
    main(args)
  File "prospr.py", line 72, in main
    dist_pred, dist_loss = domain(args.domain, network, args.stride)
  File "/path/to/prospr_test/prospr/prediction.py", line 52, in domain
    input_vector, label, label_ss_i, label_ss_j, label_phi_i, label_phi_j, label_psi_i, label_psi_j = pickled(name, i, j)
  File "/path/to/prospr_test/prospr/dataloader.py", line 60, in pickled
    potts_j_crop = potts_j[lower_i:upper_i, lower_j:upper_j].reshape((irange,jrange,22*22))
ValueError: cannot reshape array of size 6233920 into shape (32,32,484)

Also, 2o6pA00.pkl file downloaded from https://byu.app.box.com/v/ProteinStructurePrediction was not compatible for the same process,

$ python3.6 prospr.py run 2o6pA00
Traceback (most recent call last):
  File "/path/to/prospr_test/prospr.py", line 135, in <module>
    main(args)
  File "/path/to/prospr_test/prospr.py", line 72, in main
    dist_pred, dist_loss = domain(args.domain, network, args.stride)
  File "/path/to/prospr_test/prospr/prediction.py", line 17, in domain
    seq = data['seq']
KeyError: 'seq'

I think the provided 2o6pA00.pkl doesn’t contain the seq key (python3.6 -m pickle 2o6pA00.pkl).

I’m at a loss for finding a solution. I would appreciate your help.

P.S. Aside from the discussion above, I added some modifications for the original plmDCA matlab codes to work (https://github.com/magnusekeberg/plmDCA). In line 48 of ./plmDCA_asymmetric_v2/3rd_party_code/minFunc/autoDiff/autoTensor.m,

- [~ ~ diff(:,:,j)] = funObj(x + mu*e_j,varargin{:});
+ [~, ~, diff(:,:,j)] = funObj(x + mu*e_j,varargin{:});

For ./plmDCA_asymmetric_v2/plmDCA_asymmetric.m,

- addpath(genpath(pwd))
+ % addpath(genpath(pwd))
        % matlabpool('open',nr_of_cores)  
        parpool('local',nr_of_cores) % Use Parallel Computing Toolbox
        tic
        parfor r=1:N
            disp(strcat('Minimizing g_r for node r=',int2str(r)))
            wr=min_g_r(Y,weights,N,q,scaled_lambda_h,scaled_lambda_J,r,options);
            w(:,r)=wr;
        end
        toc
        % matlabpool('close')
        delete(gcp('nocreate')) % Use Parallel Computing Toolbox         

I compiled with the following command, mcc -m plmDCA_asymmetric.m -a functions -a 3rd_party_code.