Statistics
statistics.RmdHere, we sum up the statistics in HaDeX2. Some of the elements are discussed in other articles in appropriate places, but this article gathers this information in one place.
Uncertainty propagation
The propagation of uncertainty (Puchała et al. 2020; Weis 2021) is necessary when we are transforming the measured values. In HDX-MS, we repeat the measurements in triplicate in order to calculate the uncertainty of mass measurement. However, when transforming mass measurements into deuterium uptake, we need to propagate mass measurement uncertainty, using the Law of Propagation of Uncertainty []:
Where:
- - combined uncertainty of value , where is a function of ,
- - values for which the uncertainty is known.
This is a generic equation used the derivatives of functions. It is
created for deuterium uptake in the appropriate forms (as the equations
differ based on the parameters of calculations) and described in detail
in the article vignette("datafiles").
Joint Committee for Guides in Metrology. (2008) JCGM 100: evaluation of measurement data – guide to the expression of uncertainty in measurement. Technical report, JCGM
Hybrid testing
Hybrid testing (Hageman and Weis 2019) is a combination of two statistical approaches to ensure that the difference between two biological states is statistically significant. The difference is significant if two tests simultaneously claims the significance.
However, it can be only achieved when we have the experiment done at least in triplicate, as it is the condition to perform Student t-test.
Houde interval
This test is done for the time points chosen for a given plot e.q. for the volcano plot, where presenting multiple time points of measurement, we take the values from all of the presented time points. However, for Woods Plot we only take into account only one time point - presented on the plot.
Houde interval (Houde, Berkowitz, and Engen 2011) is calculated based on the uncertainty of the measurement - or, more precisely, the propagated uncertainty of the deuterium uptake (in the same form as values presented on the plot). As described in the equation:
where:
- - number of peptides,
- - uncertainty of the deuterium uptake for th peptide,
- - value for test for replicates from the table,
- - number of the replicates of the experiment.
is calculates as follows, using R-function qt:
where the degree of freedom is the number of replicates minus one, and alpha is for the desired confidence level (usually 0.98).
Basically, we take the mean uncertainty of deuterium uptake and widen this range by the appropriate value to get an interval. Values under the interval are too small and may be mistaken with the uncertainty. We are not interested in them.
Student’s t-test
In order to use student t-test, we need at least three values from each group - in the case of the differential analysis - at least three replicate values at given time for each biological state.
This test shows us if the values are from two different distributions (desired option) or from one - and are the same. We are not interested in the latter case.
We use the unpaired Student’s t-test to calculate P-value. The null hypothesis is that this two distributions are the same. If calculated P-value exceeded limit set for chosen confidence limit, we reject the null hypothesis and assume that the distributions are different.
To calculate P-value we use base R-function t.test
t.test(x = st_1, y = st_2,
paired = FALSE,
alternative = "two.sided",
conf.level = confidence_level)$p.valuewhere is a set of values from the first state, and from the second.
If this option is chosen, we adjust the P-value using appropriate adjustment method (with three options: none, BH and bonferonni):
p.adjust(p_dat[["P_value"]], method = p_adjustment_method)P-value is usually presented in the form of , e.q. on the volcano plot.