Markerlynx processing of LARGE UPLC-QTOF/MS data sets

<p>Dear All,</p><p></p><p>We are infact heavily using UPLC-QTOF/MS system for omics data generation. Makerlynx processing of small data sets (less than 1000 samples) for marker assignment is possible. However, we have been using the sample system for data generation of larger data sets (4000-5000 samples) and Markerlynx did not cope with processing such large data sets. So, Markerlynx always crashes after processing about half the samples whatever parameters are used for processing this large data set. We did contact Waters for explanations and help, but we did not really succeed in getting Markerlynx working. We doubt that the computer capacity (speedness) is the causing the problem, but in order to use faster computers (larger RAMs I guess, sorry....not good in computer matters), we have been thinking to use 64-bit operating systems (Windows 7 or Vista) for that perpose but Markerlynx is only working with 32-bit operating systems. Therefore, we are now in a delimma of how to solve this issue with Markerlynx to process large data sets produced by UPLC-QTOF/MS. I wonder whether any of you have had such a problem, if yes I would be happy to join forces so we could go on with data processing. Any feedback or comments will also be much appreciated.</p><p>Regards</p><p>T. Barri</p>


  • lizh


    Apologies for the delay on getting this response to you.

    MarkerLynx can only utilise up to 2GB of RAM, irrespective of how much memory you have on your computer. Once peak detection has been applied to each sample, the detected peaks have to be aligned across the samples. To do this a large amount of memory has to be assigned so the marker mass and retention times can be sorted efficiently. I have a version that does not require the memory allocation but unfortunately this turned a 17 second processing time into 17 hours for a data set that was not particularly large. Therefore this solution is not practical. So at present we have to manage the data processed such that memory allocation is not exceeded. I attach a document that gives some tips to try.

    However, there is also my colleagues raised the question as to why there was a requirement for such large data sets? CAn you share what you are doing, there could be an opportunity to design the experiments to get to a workflow that could be more efficient.

    Many TX


  • Hi Liz,

    Thanks for replying. I have actually got such a file from Waters to start solving the issue.

    I have tried applying few of the ideas mentioned, for example a relatively higher Intensity Threshold (40 instead of 20 as we usually work with) and a narrow m/z range processed (like 50-149.99 m/z, 150.00-249.99 m7z, etc. up to 1000 m/z). However, I did not succeed in getting the samples processed.

    I am now thinking to start working on Retention Time slices of the chromatogram to process the samples, but the risk of missing some important features is going to be high as the retention time going to be selected will be random (like 0.00-0.99 min 1st processing, 1.00-1.99 min second processing, 2.00-2.99 min second processing, etc. up to 6.4 min run time).

    As for your question why we want to process such large data sets, we are involved in large projects where thousands of people were recruited in cohort studies. So among those people we have controls and cases of different diseases and we are not allowed to have the Key for those samples to devide them as subgroups and process the data. Therefore, the whole data set has to be processed simultaneously.

    Any more ideas!

    I appreciate it so much.


    T. Barri