Markerlynx processing of LARGE UPLC-QTOF/MS data sets
Answers
-
Hello
Apologies for the delay on getting this response to you.
MarkerLynx can only utilise up to 2GB of RAM, irrespective of how much memory you have on your computer. Once peak detection has been applied to each sample, the detected peaks have to be aligned across the samples. To do this a large amount of memory has to be assigned so the marker mass and retention times can be sorted efficiently. I have a version that does not require the memory allocation but unfortunately this turned a 17 second processing time into 17 hours for a data set that was not particularly large. Therefore this solution is not practical. So at present we have to manage the data processed such that memory allocation is not exceeded. I attach a document that gives some tips to try.
However, there is also my colleagues raised the question as to why there was a requirement for such large data sets? CAn you share what you are doing, there could be an opportunity to design the experiments to get to a workflow that could be more efficient.
Many TX
Liz
0 -
Hi Liz,
Thanks for replying. I have actually got such a file from Waters to start solving the issue.
I have tried applying few of the ideas mentioned, for example a relatively higher Intensity Threshold (40 instead of 20 as we usually work with) and a narrow m/z range processed (like 50-149.99 m/z, 150.00-249.99 m7z, etc. up to 1000 m/z). However, I did not succeed in getting the samples processed.
I am now thinking to start working on Retention Time slices of the chromatogram to process the samples, but the risk of missing some important features is going to be high as the retention time going to be selected will be random (like 0.00-0.99 min 1st processing, 1.00-1.99 min second processing, 2.00-2.99 min second processing, etc. up to 6.4 min run time).
As for your question why we want to process such large data sets, we are involved in large projects where thousands of people were recruited in cohort studies. So among those people we have controls and cases of different diseases and we are not allowed to have the Key for those samples to devide them as subgroups and process the data. Therefore, the whole data set has to be processed simultaneously.
Any more ideas!
I appreciate it so much.
Cheers
T. Barri
0