Distributed Computing - How to?!

Dear All,

at the moment I have limited computing capacity with my 4 core laptop. I am running covariates searches and bootstraps with the “Local_MPI_4” option. As one can imagine it takes quite some time.
I am interested in the grid computing presented here:

https:// www. youtube. com/watch?v=SLWuczDIZhM

I aimm at setting up a local multi-core server for computing, but the question is, what do I need? And what are the requirments of Phoenix NLME?

  • Is there a limit to the number of cores that can simultaneously work?

  • Does it matter if I choose windows or unix?

  • how do you set up the local server to do the computations which I sent form my laptop?

→ Grid computing is quite new to me and I haven’t set up such a system yet.

Hope you can help!

Best,

Daniel

Dear Daniel,

Phoenix’s help file contains a description of this and you can check it out.

For Phoenix 8.0, this chapter is “Phoenix NLME User’s Guide”, “Job Control Setup”

Best,
f_yc

Hi f_yc,

thanks! Reading at the moment…

best

Daniel

Hi Daniel,

we just had a webinar around this topic yesterday:

youtu.be/pGOnRqKaDS0?a

Let me know if the descriptions in the manual are sufficient. We can talk over the phone if there are questions.

Bernd

Hi Bern, Hi, all,

It seems I may be given THE chance to test and use Phoenix in a Multiprocessing environment.

I think the key thing is that IT folks undertstood that only a Linux compiled NLME module is replicated in the Linux cluster (the understanding we have of the Phoenix docs).

When discusssing between (my) internal IT support and Linux cluster holder (outsourced), they exchanged the following.


The key question is how your client and the cluster needs to be connected.

We have several constraint due to sanofi security requirement, and one of them is that direct connection from sanofi network to the YYYYYYY platform is not possible : that means if your job submission goes through other protocols than http/https, it won’t work : FTP is not possible.

Does Certara have technical documentation regarding the connection between Phoenix and the grid?


Do you have other documents than the ones available through the help.

Many Thanks.

Gilles

Hi Gilles, Phoenix uses sftp i.e. (SSH File Transfer Protocol) so this should be possible. Does this help your IT group make a SSH connection between client and server.

Simon.

Hi All,

I tried to build a Linux node myself, but I encountered problems.

My environment:

Phoenix client:

Operating system: Windows10

Phoenix version: Phoenix 8.1

Linux server:

Operating system: CentOS Linux 7

IP: 192.168.31.130

Installed software: epel-release, gcc, R, ksh, libxml2-devel, nfs-utils, rpcbind, torque-4.2.9.tar.gz, openssl-devel, boost-devel, libtool-y

R version: 3.5.2

The following R packages are installed: batchtools, XML, reshape, Certara.NLME8

Mounted shared directory: mount -t nfs 192.168.31.130: /var/tmp/nlme /mnt

The TORQUE job control software is installed.

[root@master /]# qnodes

cn1

state = free

np = 2

ntype = cluster

status = rectime=1560150776,varattr=,jobs=,state=free,netload=491676,gres=,loadave=0.00,ncpus=2,physmem=3865308kb,availmem=5562644kb,totmem=5962456kb,idletime=655,nusers=2,nsessions=4,sessions=1543 1555 1605 1677,uname=Linux cn1 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64,opsys=linux

mom_service_port = 15002

mom_manager_port = 15003

master

state = free

np = 2

ntype = cluster

status = rectime=1560150774,varattr=,jobs=,state=free,netload=284510000,gres=,loadave=0.00,ncpus=2,physmem=3865308kb,availmem=4256656kb,totmem=5962456kb,idletime=74353,nusers=3,nsessions=3,sessions=1654 1498 29642,uname=Linux master 3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64,opsys=linux

mom_service_port = 15002

mom_manager_port = 15003

scene 1:

Configuration: 192.168.31.130|Linux|MultiCore|test3||/mnt|/bin/R|2|

1.1 In the “simple” and “Predictive” Run mode, I can submit the NLME task to Linux. Linux can complete the task calculation, and the result can be returned to Phoenix from Linux.

1.2 In the “Bootstrap” Run mode, I can submit the NLME task to Linux. Linux can complete the task calculation, but the Phoenix client confirms that it is running, and the result is not returned to Phoenix.

You can see all the results of the calculation in the Linux directory.

[sharedmedia=core:attachments:3384]

But the Phoenix client always shows “Running NLME on system”

[sharedmedia=core:attachments:3385]

View the file “DME_BO~1.619-496/NlmeRemote.LOG” in the Linux shared directory to get the following information:

nohup: 忽略输入

/usr/bin/R

Rscript /mnt/InstallDirNLME/bootstrap.r MultiCore /mnt /mnt/DME_BO~1.619-496 3 1000 2 2 test.mdl cols1.txt data1.txt 9316 nlmeargs.txt nlmeargs.txt test.mdl nlmeargs.txt cols1.txt data1.txt test.mdl 2 95

WORKING_DIR=/mnt/NLME173ac7f6fe137/NLME173ac12b7b585,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

Deleting files


-------------------- Translating --------------------------

/mnt/InstallDirNLME/TDL4 /hash 1408304074 /L ./test.mdl ./Work

Done


------------------- Compliling *.c -------------------------


----------------------- Linking -----------------------------


ln: æ— æ³•åˆ›å»ºç¬¦å·é“¾æŽ¥"/mnt/NLME173ac7f6fe137/NLME173ac12b7b585/NLME7.exe": 文件已存在

NULL

Warning messages:

1: In stuff[row] ← currentList : 被替换的项目不是替换值长度的倍数

2: In stuff[row] ← currentList : 被替换的项目不是替换值长度的倍数

1.3 In the “Cov.Srch.Stepwise” Run mode, I can submit the NLME task to Linux. Linux can complete the task calculation, but the Phoenix client confirms that it is running, and the result is not returned to Phoenix.

You can see all the results of the calculation in the Linux directory.

[sharedmedia=core:attachments:3386]

But the Phoenix client always shows “Running NLME on system”

View the file “DME_SI~1.512-480/NlmeRemote.LOG” in the Linux shared directory to get the following information:

nohup: 忽略输入

/usr/bin/R

Rscript /mnt/InstallDirNLME/stepwise_covarsrch.r MultiCore /mnt/InstallDirNLME /mnt /mnt/DME_SI~1.512-480 test.mdl nlmeargs.txt test.mdl cols1.txt data1.txt nlmeargs.txt 3 V-wt V-apgr Ke-wt -2LL:1,1,1 0.01 0.001 2 Pheno Model

WORKING_DIR=/mnt/NLME16aa821e9ed02/NLME16aa87e37534d,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

Deleting files


-------------------- Translating --------------------------

/mnt/InstallDirNLME/TDL4 /hash 1408304609 /L ./test.mdl ./Work

Done


------------------- Compliling *.c -------------------------


----------------------- Linking -----------------------------


unix2dos: converting file /mnt/NLME16aa821e9ed02/NLME16aa87e37534d/jobs/01/1//out000.txt to DOS format …

unix2dos: converting file /mnt/NLME16aa821e9ed02/NLME16aa87e37534d/jobs/02/2//out100.txt to DOS format …

unix2dos: converting file /mnt/NLME16aa821e9ed02/NLME16aa87e37534d/jobs/03/3//out010.txt to DOS format …

unix2dos: converting file /mnt/NLME16aa821e9ed02/NLME16aa87e37534d/jobs/04/4//out001.txt to DOS format …

WORKING_DIR=/mnt/NLME16aa8771a8725/NLME16aa83f361009,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

Deleting files


-------------------- Translating --------------------------

/mnt/InstallDirNLME/TDL4 /hash 1408304609 /L ./test.mdl ./Work

Done


------------------- Compliling *.c -------------------------


----------------------- Linking -----------------------------


unix2dos: converting file /mnt/NLME16aa8771a8725/NLME16aa83f361009/jobs/01/1//out110.txt to DOS format …

unix2dos: converting file /mnt/NLME16aa8771a8725/NLME16aa83f361009/jobs/02/2//out101.txt to DOS format …

[1] “/mnt/NLME16aa821e9ed02” “/mnt/NLME16aa8771a8725”

Scene 2:

Configuration: 192.168.31.130|Linux|TORQUE|test4||/mnt|/bin/R|2|

In this scenario, any “run mode” cannot be completed.

In the “simple” and “Predictive” modes, I can submit the NLME task to Linux. Linux can complete the task calculation, but the Phoenix client confirms that it is running and the result is not returned to Phoenix..

2.1 In the simple and Predictive mode, I can submit the NLME task to Linux. Linux can complete the task calculation, but the calculation result cannot return to Phoenix from Linux.

You can see all the results of the calculation in the Linux directory.

[sharedmedia=core:attachments:3387]

But the Phoenix client always shows “Running NLME on system”

[sharedmedia=core:attachments:3388]

DME_SI~1.113-909/NlmeRemote.LOG:

/usr/bin/R

Rscript /mnt/InstallDirNLME/generic_run.r COVAR_SEARCH TORQUE /mnt/InstallDirNLME /mnt /mnt/DME_SI~1.113-909 nlmeControlFile.txt 2 SingleNlme

载入需要的程辑包:data.table

No readable configuration file found

Created registry in ‘/mnt/NLME157711491531c/NLME15771147abe8c/registry’ using cluster functions ‘Interactive’

WORKING_DIR=/mnt/NLME157711491531c/NLME15771147abe8c,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

Deleting files


-------------------- Translating --------------------------

/mnt/InstallDirNLME/TDL4 /hash 1408305253 /L ./test.mdl ./Work

Done


------------------- Compliling *.c -------------------------


----------------------- Linking -----------------------------


Adding 1 jobs …

Submitting 1 jobs in 1 chunks using cluster functions ‘TORQUE’ …

unix2dos: converting file /mnt/NLME157711491531c/NLME15771147abe8c/../out000001.txt to DOS format …

unix2dos: converting file /mnt/NLME157711491531c/NLME15771147abe8c/../nlme7engine.log to DOS format …

[1] “removeRegistry() AGAIN”

[1] “/mnt/NLME157711491531c”

What caused this problem?
How can I solve it?

Best,
0521

Can you attach the content of progress.xml?

Cheers,

Fred

[quote=“fsoltanshahi, username:fsoltanshahi”]

Can you attach the content of progress.xml?

Cheers,

Fred

[/quote]Hi Fred,

Thank you very much for replying to me.

This is the corresponding “progress.xml” file for 1.2:
LocalHostMultiCore6月 2019 10 07时55分01秒6月 2019 10 07时55分09秒Finished22000Preparing files for Bootstrap run

I attached all the files under the “DME~” folder corresponding to the 1.2 scene and the 2.1 scene.

Best,
05211.2bootstrap.zip (50 KB)2.1TORQUE_Simple.zip (66.9 KB)

Please check progress.xml file on the desktop, it should be in %TMP%/Phoenix/DME_xxxxx to see if it matches progress.xml on the Linux side.

I would also check Phoenix log files for any connections errors, this sound to me like Phoenix is loosing connection to remote system and cannot get updated on job’s status(completion).

Hi Fred,

The “progress.xml” in the Windows “%TMP%/Phoenix/DME_xxxxx” is this:

LocalHost TORQUE 6月 2019 18 16时10分22秒 6月 2019 18 16时10分29秒 Finished 1 1 0 0 0

[sharedmedia=core:attachments:3402]

Phoenix log files:

2019-06-19 00:07:38.1600|Error|Cannot load C:\Users\HASEE\AppData\Local\Temp\Phoenix\DME_PR~1.492\progress.xml 该字符串未被识别为有效的 DateTime。|Application|||||

[sharedmedia=core:attachments:3401]

I have attached all the files in the Windows directory “C:\Users*****\AppData\Local\Temp\Phoenix\DME_PredCheck_12-05-50.492__234ae21c-2ae8-46b3-b49d-562946965154” .

“该字符串未被识别为有效的” in English is “The string is not recognized as valid”

The reason for the error is because my date is a Chinese character, is it?

Who decided the format of the date?

  1. Windows where the Phoenix is located?
    2.Linux Host?
    3.Linux node?

Thanks

0521

2.1TORQUE_Simple_Windows_file.zip (7.26 KB)

Progress.xml is created on the Linux side. The date is acquired in R by calling:

format(as.POSIXlt(Sys.time(), “UTC”),“%b %Y %d %X”)

You should be able to change your Linux settings to report date/time in English.

Good luck,

Fred

Thank you very much Fred!
The problem with 1.2, 1.3, 2.1 has been resolved.

But there are new problems:
In the case of scenario 2, the “Bootstrap” and “Cov.Srch.Stepwise” modes of operation report an error.

Scene 2:

2.2 In Bootstrap Run mode, I can submit NLME tasks to Linux. An error occurred during the calculation of Linux (a part of the iteration was completed), and Phoenix gave “Error mesage”:

Phoenix gave “Error mesage”:


Execution Error


There was an error while executing Workflow.Pheno Stdev Covar

Model execution failed.

Unable to run bootstrap

See Remote Execution Log for possible explaination


OK


Linux host file:

[sharedmedia=core:attachments:3404]

/NlmeRemote.LOG:

nohup: ignoring input

/usr/bin/R

Rscript /mnt/InstallDirNLME/bootstrap.r TORQUE /mnt /mnt/DME_BO~1.818-288 3 1000 10 2 test.mdl cols1.txt data1.txt 28844 nlmeargs.txt nlmeargs.txt test.mdl nlmeargs.txt cols1.txt data1.txt test.mdl 2 95

Loading required package: data.table

No readable configuration file found

Created registry in ‘/mnt/NLMEd65e1f1dc513/NLMEd65e1e13f372/registry’ using cluster functions ‘Interactive’

No readable configuration file found

Created registry in ‘/mnt/NLMEd65e1f1dc513/NLMEd65e39cd6bf8/registry’ using cluster functions ‘Interactive’

WORKING_DIR=/mnt/NLMEd65e1f1dc513/NLMEd65e39cd6bf8,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

Deleting files


-------------------- Translating --------------------------

/mnt/InstallDirNLME/TDL4 /hash 1376165901 /L ./test.mdl ./Work

Done


------------------- Compliling *.c -------------------------


----------------------- Linking -----------------------------


Adding 1 jobs …

Submitting 1 jobs in 1 chunks using cluster functions ‘TORQUE’ …

Adding 10 jobs …

[1] “Failed to performBootstrap()”

[1] “Error is : Error in chunkIds(reg = gridRegistry, ids = findJobs(reg = gridRegistry), : could not find function "chunkIds"\n”

used (Mb) gc trigger (Mb) max used (Mb)

Ncells 734206 39.3 1318958 70.5 1318958 70.5

Vcells 1373525 10.5 8388608 64.0 2072528 15.9

I checked the Phoenix log and there was no error log in “Application_Error.txt”.

I attached all the files under the Linux host.

2.3 In “Cov.Srch.Stepwise” Run mode, I can submit NLME tasks to Linux. An error occurred during the calculation of Linux, and Phoenix gave “Error mesage”:

Phoenix gave “Error mesage”:


Execution Error


There was an error while executing Workflow.Pheno Stdev Covar

Model execution failed.

Failed to run stepwise covariate search

See Remote Execution Log for possible explaination


OK


Linux host file****:

/NlmeRemote.LOG:

nohup: ignoring input

/usr/bin/R

Rscript /mnt/InstallDirNLME/stepwise_covarsrch.r TORQUE /mnt/InstallDirNLME /mnt /mnt/DME_SI~1.588-338 test.mdl nlmeargs.txt test.mdl cols1.txt data1.txt nlmeargs.txt 3 V-wt Ke-wt V-apgr -2LL:1,1,1 0.01 0.001 2 Pheno Stdev Covar

Loading required package: data.table

No readable configuration file found

Created registry in ‘/mnt/NLME11f757a053d04/NLME11f751f32ab47/registry’ using cluster functions ‘Interactive’

WORKING_DIR=/mnt/NLME11f757a053d04/NLME11f751f32ab47,MPIFLAG=MPINO, LOCAL_HOST=NO,NUM_NODES=1,SHARED_DRIVE=

model=test.mdl, nlmeDir=/mnt/InstallDirNLME

Deleting files


-------------------- Translating --------------------------

/mnt/InstallDirNLME/TDL4 /hash 1376170193 /L ./test.mdl ./Work

Done


------------------- Compliling *.c -------------------------


----------------------- Linking -----------------------------


Adding 4 jobs …

<simpleError in chunkIds(reg = gridRegistry, ids = findJobs(reg = gridRegistry), n.chunks = numberOfChunks): could not find function “chunkIds”>

Error in get(“jobsDirectoryRoot”, envir = nlmeEnv) :

object ‘jobsDirectoryRoot’ not found

Calls: print … performStepwiseCovarSearch → summarizeStepwiseCovarSearch → get

Execution halted

I checked the Phoenix log and there was no error log in “Application_Error.txt”.

I attached all the files under the Linux host.

DME_BO~1.818-288.zip (11.1 KB)NLMEd65e1f1dc513.zip (1.69 MB)2.3_torque_VPC_DME_SI~1.588-338.zip (12.2 KB)2.3_torque_VPC_NLME11f757a053d04.zip (791 KB)

You need to downgrade R library package batchtools from 0.9.11 to 0.9.10. This is explained in PHX-6979 and is fixed in 8.2.

Fred

Hi Fred,

The problem is solved, thank you very much! ! !

Cheers,

0521