This function adds a new column to the variable_info slot of a mass_dataset object, which contains the frequency of NA (Not Available) values for each variable according to the samples specified.

mutate_variable_na_freq(object, according_to_samples = "all")

Arguments

object

A mass_dataset object.

according_to_samples

A character vector specifying the sample IDs to consider when calculating the frequency of NA values. Default is "all", which considers all samples.

Value

A modified mass_dataset object with an updated variable_info slot.

Author

Xiaotao Shen shenxt1990@outlook.com

Examples

data("expression_data")
data("sample_info")
data("variable_info")

object =
  create_mass_dataset(
    expression_data = expression_data,
    sample_info = sample_info,
    variable_info = variable_info,
  )

object
#> -------------------- 
#> massdataset version: 1.0.28 
#> -------------------- 
#> 1.expression_data:[ 1000 x 8 data.frame]
#> 2.sample_info:[ 8 x 4 data.frame]
#> 8 samples:Blank_3 Blank_4 QC_1 ... PS4P3 PS4P4
#> 3.variable_info:[ 1000 x 3 data.frame]
#> 1000 variables:M136T55_2_POS M79T35_POS M307T548_POS ... M232T937_POS M301T277_POS
#> 4.sample_info_note:[ 4 x 2 data.frame]
#> 5.variable_info_note:[ 3 x 2 data.frame]
#> 6.ms2_data:[ 0 variables x 0 MS2 spectra]
#> -------------------- 
#> Processing information
#> 1 processings in total
#> create_mass_dataset ---------- 
#>       Package         Function.used                Time
#> 1 massdataset create_mass_dataset() 2023-10-01 23:24:35

##calculate NA frequency according to all the samples
object2 =
  mutate_variable_na_freq(object = object)

head(extract_variable_info(object))
#>     variable_id        mz        rt
#> 1 M136T55_2_POS 136.06140  54.97902
#> 2    M79T35_POS  79.05394  35.36550
#> 3  M307T548_POS 307.14035 547.56641
#> 4  M183T224_POS 183.06209 224.32777
#> 5   M349T47_POS 349.01584  47.00262
#> 6  M182T828_POS 181.99775 828.35712
head(extract_variable_info(object2))
#>     variable_id        mz        rt na_freq
#> 1 M136T55_2_POS 136.06140  54.97902   0.250
#> 2    M79T35_POS  79.05394  35.36550   0.250
#> 3  M307T548_POS 307.14035 547.56641   0.375
#> 4  M183T224_POS 183.06209 224.32777   0.750
#> 5   M349T47_POS 349.01584  47.00262   0.250
#> 6  M182T828_POS 181.99775 828.35712   0.125

##calculate NA number according to only QC samples
object3 =
  mutate_variable_na_freq(object = object2,
                according_to_samples =
              get_sample_id(object)[extract_sample_info(object)$class == "QC"])

object3
#> -------------------- 
#> massdataset version: 1.0.28 
#> -------------------- 
#> 1.expression_data:[ 1000 x 8 data.frame]
#> 2.sample_info:[ 8 x 4 data.frame]
#> 8 samples:Blank_3 Blank_4 QC_1 ... PS4P3 PS4P4
#> 3.variable_info:[ 1000 x 5 data.frame]
#> 1000 variables:M136T55_2_POS M79T35_POS M307T548_POS ... M232T937_POS M301T277_POS
#> 4.sample_info_note:[ 4 x 2 data.frame]
#> 5.variable_info_note:[ 5 x 2 data.frame]
#> 6.ms2_data:[ 0 variables x 0 MS2 spectra]
#> -------------------- 
#> Processing information
#> 2 processings in total
#> create_mass_dataset ---------- 
#>       Package         Function.used                Time
#> 1 massdataset create_mass_dataset() 2023-10-01 23:24:35
#> mutate_variable_na_freq ---------- 
#>       Package             Function.used                       Time
#> 1 massdataset mutate_variable_na_freq() 2023-10-01 23:24:35.439725
#> 2 massdataset mutate_variable_na_freq() 2023-10-01 23:24:35.449135

head(extract_variable_info(object3))
#>     variable_id        mz        rt na_freq na_freq.1
#> 1 M136T55_2_POS 136.06140  54.97902   0.250       0.0
#> 2    M79T35_POS  79.05394  35.36550   0.250       0.0
#> 3  M307T548_POS 307.14035 547.56641   0.375       0.0
#> 4  M183T224_POS 183.06209 224.32777   0.750       1.0
#> 5   M349T47_POS 349.01584  47.00262   0.250       0.0
#> 6  M182T828_POS 181.99775 828.35712   0.125       0.5