Condense Measure Log-Ratios — CondenseLogs • zoolog

This function condenses the calculated log ratio values into a reduced number of features by grouping log ratio values and selecting or calculating a feature value. By default the selected groups each represents a single dimension, i.e. Length and Width. Only one feature is extracted per group. Currently, two methods are possible: priority (default) or average.

CondenseLogs(
  data,
  grouping = list(Length = c("GL", "GLl", "GLm", "HTC"), Width = c("BT", "Bd", "Bp",
    "SD", "Bfd", "Bfp"), Depth = c("Dd", "DD", "BG", "Dp")),
  method = "priority"
)

Arguments

data: A dataframe with the input measurements.
grouping: A list of named character vectors. The list includes a vector per selected group. Each vector gives the group of measurements in order of priority. By default the groups are Length = c("GL", "GLl", "GLm", "HTC"), Width = c("BT", "Bd", "Bp", "SD", "Bfd", "Bfp"), and Depth = c("Dd", "DD", "BG", "Dp"). The order is irrelevant for method = "average".
method: Character string indicating which method to use for extracting the condensed features. Currently accepted methods: "priority" (default) and "average".

Value

A dataframe including the input dataframe and additional columns, one for each extracted condensed feature, with the corresponding name given in grouping.

Details

This operation is motivated by two circumstances. First, not all measurements are available for every bone specimen, which obstructs their direct comparison and statistical analysis. Second, several measurements can be strongly correlated (e.g. SD and Bd both represent bone width). Thus, considering them as independent would produce an over-representation of bone remains with more measurements per axis. Condensing each group of measurements into a single feature (e.g. one measure per axis) palliates both problems.

Observe that an important property of the log-ratios from a reference is that it makes the different measures comparable. For instance, if a bone is scaled with respect to the reference, so that it homogeneously doubles its width, then all width related measures (BT, Bd, Bp, SD, ...) will give the same log-ratio (log(2)). In contrast, the absolute measures are not directly comparable.

The measurement names in the grouping list are given without the logPrefix. But the selection is made from the log-ratios.

The default method is "priority", which selects the first available measure log-ratio in each group. The method "average" extracts the mean per group, ignoring the non-available measures. We provide the following by-default group and prioritization: For lengths, the order of priority is: GL, GLl, GLm, HTC. For widths, the order of priority is: BT, Bd, Bp, SD, Bfd, Bfp. For depths, the order of priority is: Dd, DD, BG, Dp This order maximises the robustness and reliability of the measurements, as priority is given to the most abundant, more replicable, and less age dependent measurements.

This method was first used in: Trentacoste, A., Nieto-Espinet, A., & Valenzuela-Lamas, S. (2018). Pre-Roman improvements to agricultural production: Evidence from livestock husbandry in late prehistoric Italy. PloS one, 13(12), e0208109.

Alternatively, a user-defined method can be provided as a function with a single argument (data.frame) assumed to have as columns the measure log-ratios determined by the grouping.

Examples

## Read an example dataset:
dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz",
                        package="zoolog")
dataExample <- utils::read.csv2(dataFile,
                                na.strings = "",
                                encoding = "UTF-8")
## For illustration purposes we keep now only a subset of cases to make
## the example run sufficiently fast.
## Avoid this step if you want to process the full example dataset.
dataExample <- dataExample[1:1000, ]

## Compute the log-ratios and select the cases with available log ratios:
dataExampleWithLogs <- RemoveNACases(LogRatios(dataExample))
#> Warning: Reference for Sus scrofa used for cases of Sus domesticus.
#>    Reference for Sus scrofa used for cases of Sus.
#>    Set useGenusIfUnambiguous to FALSE if this behaviour is not desired.
#> Warning: Data includes some cases recorded as
#>     * Caprini (which is a Tribe)
#>       for which the reference for Ovis aries or Capra hircus could be used.
#>    Set joinCategories as appropriate if you want to use any of them.
## We can observe the first lines (excluding some columns for visibility):
head(dataExampleWithLogs)[, -c(6:20,32:63)]
#>   Site N.inv    UE Especie       Os    GL    Bp   Dp   SD   DD   Bd   Dd   BT
#> 1  ALP  3453 10410    ovar 1fal ant  27.1   9.9 12.3 17.9  9.0  9.0   NA   NA
#> 2  ALP  3455 10410    ovar 1fal ant  27.6   9.6 12.2  7.6  8.9  8.3   NA   NA
#> 3  ALP  4245  7036    cahi      hum    NA 128.3   NA 12.9   NA 27.4 26.6 23.6
#> 4  ALP  4674 10227    cahi      hum    NA    NA   NA   NA   NA 26.0 25.7 22.3
#> 5  ALP  4085 10253    cahi      hum    NA    NA   NA   NA   NA 27.9 27.3 23.2
#> 6  TFC    24   407    ceel       mc 262.7  41.3 30.8 25.0 21.2 41.1 27.1   NA
#>   GLc BFd Dl HmandM3 logGL       logBp       logDp      logSD       logBd
#> 1  NA  NA NA      NA    NA -0.07991177 -0.07265930  0.2629585 -0.08911977
#> 2  NA  NA NA      NA    NA -0.09327573 -0.07620458 -0.1090810 -0.12428419
#> 3  NA  NA NA      NA    NA  0.40167955          NA -0.2116296 -0.15130497
#> 4  NA  NA NA      NA    NA          NA          NA         NA -0.17408218
#> 5  NA  NA NA      NA    NA          NA          NA         NA -0.14345133
#> 6  NA  NA NA      NA    NA          NA          NA         NA -0.03354115
#>         logDd logBT logGLc logBFd logDl logGB logSLC logGLP logBG logLG logDPA
#> 1          NA    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 2          NA    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 3 -0.06787875    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 4 -0.08282727    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 5 -0.05659774    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 6          NA    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#>   logBPC logLA logLAR logSH logSB logL logH
#> 1     NA    NA     NA    NA    NA   NA   NA
#> 2     NA    NA     NA    NA    NA   NA   NA
#> 3     NA    NA     NA    NA    NA   NA   NA
#> 4     NA    NA     NA    NA    NA   NA   NA
#> 5     NA    NA     NA    NA    NA   NA   NA
#> 6     NA    NA     NA    NA    NA   NA   NA

## Extract the default condensed features with the default "priority" method:
dataExampleWithSummary <- CondenseLogs(dataExampleWithLogs)
head(dataExampleWithSummary)[, -c(6:20,32:63)]
#>   Site N.inv    UE Especie       Os    GL    Bp   Dp   SD   DD   Bd   Dd   BT
#> 1  ALP  3453 10410    ovar 1fal ant  27.1   9.9 12.3 17.9  9.0  9.0   NA   NA
#> 2  ALP  3455 10410    ovar 1fal ant  27.6   9.6 12.2  7.6  8.9  8.3   NA   NA
#> 3  ALP  4245  7036    cahi      hum    NA 128.3   NA 12.9   NA 27.4 26.6 23.6
#> 4  ALP  4674 10227    cahi      hum    NA    NA   NA   NA   NA 26.0 25.7 22.3
#> 5  ALP  4085 10253    cahi      hum    NA    NA   NA   NA   NA 27.9 27.3 23.2
#> 6  TFC    24   407    ceel       mc 262.7  41.3 30.8 25.0 21.2 41.1 27.1   NA
#>   GLc BFd Dl HmandM3 logGL       logBp       logDp      logSD       logBd
#> 1  NA  NA NA      NA    NA -0.07991177 -0.07265930  0.2629585 -0.08911977
#> 2  NA  NA NA      NA    NA -0.09327573 -0.07620458 -0.1090810 -0.12428419
#> 3  NA  NA NA      NA    NA  0.40167955          NA -0.2116296 -0.15130497
#> 4  NA  NA NA      NA    NA          NA          NA         NA -0.17408218
#> 5  NA  NA NA      NA    NA          NA          NA         NA -0.14345133
#> 6  NA  NA NA      NA    NA          NA          NA         NA -0.03354115
#>         logDd logBT logGLc logBFd logDl logGB logSLC logGLP logBG logLG logDPA
#> 1          NA    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 2          NA    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 3 -0.06787875    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 4 -0.08282727    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 5 -0.05659774    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 6          NA    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#>   logBPC logLA logLAR logSH logSB logL logH Length       Width       Depth
#> 1     NA    NA     NA    NA    NA   NA   NA     NA -0.08911977 -0.07265930
#> 2     NA    NA     NA    NA    NA   NA   NA     NA -0.12428419 -0.07620458
#> 3     NA    NA     NA    NA    NA   NA   NA     NA -0.15130497 -0.06787875
#> 4     NA    NA     NA    NA    NA   NA   NA     NA -0.17408218 -0.08282727
#> 5     NA    NA     NA    NA    NA   NA   NA     NA -0.14345133 -0.05659774
#> 6     NA    NA     NA    NA    NA   NA   NA     NA -0.03354115          NA

## Extract only width with "average" method:
dataExampleWithSummary2 <- CondenseLogs(dataExampleWithLogs,
                               grouping = list(Width = c("BT", "Bd", "Bp", "SD")),
                               method = "average")
head(dataExampleWithSummary2)[, -c(6:20,32:63)]
#>   Site N.inv    UE Especie       Os    GL    Bp   Dp   SD   DD   Bd   Dd   BT
#> 1  ALP  3453 10410    ovar 1fal ant  27.1   9.9 12.3 17.9  9.0  9.0   NA   NA
#> 2  ALP  3455 10410    ovar 1fal ant  27.6   9.6 12.2  7.6  8.9  8.3   NA   NA
#> 3  ALP  4245  7036    cahi      hum    NA 128.3   NA 12.9   NA 27.4 26.6 23.6
#> 4  ALP  4674 10227    cahi      hum    NA    NA   NA   NA   NA 26.0 25.7 22.3
#> 5  ALP  4085 10253    cahi      hum    NA    NA   NA   NA   NA 27.9 27.3 23.2
#> 6  TFC    24   407    ceel       mc 262.7  41.3 30.8 25.0 21.2 41.1 27.1   NA
#>   GLc BFd Dl HmandM3 logGL       logBp       logDp      logSD       logBd
#> 1  NA  NA NA      NA    NA -0.07991177 -0.07265930  0.2629585 -0.08911977
#> 2  NA  NA NA      NA    NA -0.09327573 -0.07620458 -0.1090810 -0.12428419
#> 3  NA  NA NA      NA    NA  0.40167955          NA -0.2116296 -0.15130497
#> 4  NA  NA NA      NA    NA          NA          NA         NA -0.17408218
#> 5  NA  NA NA      NA    NA          NA          NA         NA -0.14345133
#> 6  NA  NA NA      NA    NA          NA          NA         NA -0.03354115
#>         logDd logBT logGLc logBFd logDl logGB logSLC logGLP logBG logLG logDPA
#> 1          NA    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 2          NA    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 3 -0.06787875    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 4 -0.08282727    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 5 -0.05659774    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#> 6          NA    NA     NA     NA    NA    NA     NA     NA    NA    NA     NA
#>   logBPC logLA logLAR logSH logSB logL logH       Width
#> 1     NA    NA     NA    NA    NA   NA   NA  0.03130898
#> 2     NA    NA     NA    NA    NA   NA   NA -0.10888030
#> 3     NA    NA     NA    NA    NA   NA   NA  0.01291500
#> 4     NA    NA     NA    NA    NA   NA   NA -0.17408218
#> 5     NA    NA     NA    NA    NA   NA   NA -0.14345133
#> 6     NA    NA     NA    NA    NA   NA   NA -0.03354115