transformation.Rmd
The aggregation of the data is complex. This article describes the process step by step for a better understanding of the data transformation.
Let’s see how the data is transformed. We will use the example file “KD_180110_CD160_HVEM.csv” from the HaDeX package and focus on just one peptide - “LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL” in the state “CD160”. The measurement was made for timepoint 1 min.
Below is shown the original and unaggregated data for chosen peptide.
## Protein Start End Sequence Modification
## 1: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 2: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 3: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 4: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 5: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 6: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 7: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 8: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 9: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 10: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 11: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 12: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 13: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 14: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 15: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 16: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 17: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 18: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 19: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## 20: db_CD160 34 69 LCKDRSGDCSPETSLKQLRLKRDPGIDGVGEISSQL NA
## Fragment MaxUptake MHP State Exposure File z
## 1: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_01 3
## 2: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_01 4
## 3: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_01 5
## 4: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_01 6
## 5: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_01 7
## 6: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_02 3
## 7: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_02 4
## 8: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_02 5
## 9: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_02 6
## 10: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_02 7
## 11: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_03 3
## 12: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_03 4
## 13: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_03 5
## 14: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_03 6
## 15: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_03 7
## 16: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_04 3
## 17: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_04 4
## 18: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_04 5
## 19: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_04 6
## 20: NA 33 3900.991 CD160 1 KD_160530_CD160_1min_04 7
## RT Inten Center
## 1: 4.524763 325032 1307.8329
## 2: 4.524404 753259 981.0960
## 3: 4.523707 1340447 785.0905
## 4: 4.524032 2076956 654.3858
## 5: 4.525778 759271 561.0548
## 6: 4.522330 239810 1307.7777
## 7: 4.521255 583325 981.0817
## 8: 4.521919 1011160 785.0811
## 9: 4.521726 1584254 654.4194
## 10: 4.522196 600218 561.0268
## 11: 4.523605 176788 1307.8942
## 12: 4.524804 402630 981.1597
## 13: 4.521938 746309 785.0906
## 14: 4.522238 1117344 654.4344
## 15: 4.524647 397718 561.0833
## 16: 4.531138 189258 1307.8739
## 17: 4.530608 441817 981.1656
## 18: 4.528928 796722 785.1220
## 19: 4.530066 1186263 654.4411
## 20: 4.531333 451071 561.0588
As we can see from the File
column, there are four replicates of the experiment. Each measurement of a replicate provide values for different possible charge values for each peptide. The result of a measurement is in column Center
- this is a geometrical centroid of an isotopic envelope - the product of the measurement from a mass spectrometer.
Let’s take a look for values for each replicate.
The centroid values for different charge values are not useful. We have to transform it to the mass values, according to an equation:
\[ aggMass = z*(Center - protonMass)\] The results are shown below.
This results are just for one repetition. We have four of them:
Values from each replicate are aggregated into one value, using weighted mean (with intencity value as weight):
The results from replicates are aggregated into the final result (mean), and the uncertainty (standard deviation of the mean) is calculated.
Now we have the mass value for chosen peptide in the chosen state, measured in the chosen time point. This calculation is done for every other peptide, and these values of mass and uncertainty are used in the calculation of deuterium uptake, as described in the Data processing
article.