zoologThesaurus.Rd
The thesaurus set defined for the package zoolog.
This is used to make the methods robust to different nomenclatures used
in datasets created by different authors. The user can also use other
thesaurus sets, or can modify the provided thesaurus set (see
ThesaurusManagement
and ThesaurusReaderWriter
).
zoologThesaurus
A thesaurus set is a list of thesauri with additional attributes:
Character vector with the name of each thesaurus.
Logical vector indicating whether each thesaurus should be applied to the column names of the data frame.
Logical vector indicating whether each thesaurus should be applied to the values in the corresponding column of the data frame.
Character vector with the source file of each thesaurus.
The examples below show the list of four thesauri included in the provided
zoologThesurus
.
Each thesaurus is a data frame also with additional attributes. Each column
of the data frame is a category of names with equivalent meaning in the
intended application. The column name identifies the category and is used
as the standard when applying StandardizeNomenclature
.
The names in each column (category) must not be included in any other
column, since this would make the thesaurus ambiguous (see
ThesaurusAmbiguity
).
Each thesaurus has the following attributes:
The standard name for the categories.
"data.frame"
Irrelevant
Logical indicating whether the names in the thesaurus should be considered case-sensitive.
Logical indicating whether the names in the thesaurus should be differentiated by the presence of accent marks.
Logical indicating whether the names in the thesaurus should be differentiated by the presence of punctuation marks.
The examples below show the content and characteristics of the first
thesaurus in zoologThesaurus
.
zoologThesaurus
is an exported variable automatically loaded in
memory. In addition, the source files generating it are included in the
zoolog extdata
folder. There is one file for the thesaurus set
main structure and one file for each included thesaurus. All of them are in
semicolon separated format. Thus, they can be examined in any text editor
or imported into any spreadsheet application. The files are:
zoologThesaurusSet.csv
Defines the main structure of the thesaurus set. It has a row for each thesaurus and seven columns (ThesaurusName, FileName, CaseSensitive, AccentSensitive, PunctuationSensitive, ApplyToColNames, and ApplyToColValues). Their meaning coincides with the description above. Observe that the case, accent, and punctuation sensitiveness is stored here, instead of in each thesaurus.
identifierThesaurus.csv
Thesaurus for the identifiers used
in LogRatios
to identify the bone types and the measure
names in the data and the references. It has for columns:
Taxon, Element, Measure, and Standard.
taxonThesaurus.csv
Thesaurus for the taxa. There is one column for each category of taxon considered.
elementThesaurus.csv
Thesaurus for the skeletal elements. One column for each category.
measureThesaurus.csv
Thesaurus for the measure names. One column for each category.
## List of thesaurus names and characteristics in the thesaurus set:
attributes(zoologThesaurus)
#> $names
#> [1] "identifier" "taxon" "element" "measure"
#>
#> $applyToColNames
#> [1] TRUE FALSE FALSE TRUE
#>
#> $applyToColValues
#> [1] FALSE TRUE TRUE TRUE
#>
#> $fileName
#> [1] "identifierThesaurus.csv" "taxonThesaurus.csv"
#> [3] "elementThesaurus.csv" "measureThesaurus.csv"
#>
## Content of the first thesaurus:
zoologThesaurus$identifier
#> Taxon Element Measure Standard
#> 1 Taxon Element Measure Standard
#> 2 TAX EL measurement estandar
#> 3 species anat Mass
#> 4 animal bone measurements
#> 5 Specie Osso measures
#> 6 GenusSpecies BoneBoneElement Medida
#> 7 Especie Skelettteil Mida
#> 8 Espece elemento
#> 9 Os
attributes(zoologThesaurus$identifier)
#> $names
#> [1] "Taxon" "Element" "Measure" "Standard"
#>
#> $class
#> [1] "data.frame"
#>
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9
#>
#> $caseSensitive
#> [1] FALSE
#>
#> $accentSensitive
#> [1] FALSE
#>
#> $punctuationSensitive
#> [1] FALSE
#>