Thesaurus Set for zoolog — zoologThesaurus • zoolog

The thesaurus set defined for the package zoolog. This is used to make the methods robust to different nomenclatures used in datasets created by different authors. The user can also use other thesaurus sets, or can modify the provided thesaurus set (see ThesaurusManagement and ThesaurusReaderWriter).

zoologThesaurus

Format

A thesaurus set is a list of thesauri with additional attributes:

names: Character vector with the name of each thesaurus.
applyToColNames: Logical vector indicating whether each thesaurus should be applied to the column names of the data frame.
applyToColValues: Logical vector indicating whether each thesaurus should be applied to the values in the corresponding column of the data frame.
filename: Character vector with the source file of each thesaurus.

The examples below show the list of four thesauri included in the provided zoologThesurus.

Each thesaurus is a data frame also with additional attributes. Each column of the data frame is a category of names with equivalent meaning in the intended application. The column name identifies the category and is used as the standard when applying StandardizeNomenclature.

The names in each column (category) must not be included in any other column, since this would make the thesaurus ambiguous (see ThesaurusAmbiguity).

Each thesaurus has the following attributes:

names: The standard name for the categories.
class: "data.frame"
row.names: Irrelevant
caseSensitive: Logical indicating whether the names in the thesaurus should be considered case-sensitive.
accentSensitive: Logical indicating whether the names in the thesaurus should be differentiated by the presence of accent marks.
punctuationSensitive: Logical indicating whether the names in the thesaurus should be differentiated by the presence of punctuation marks.

The examples below show the content and characteristics of the first thesaurus in zoologThesaurus.

File Structure

zoologThesaurus is an exported variable automatically loaded in memory. In addition, the source files generating it are included in the zoolog extdata folder. There is one file for the thesaurus set main structure and one file for each included thesaurus. All of them are in semicolon separated format. Thus, they can be examined in any text editor or imported into any spreadsheet application. The files are:

zoologThesaurusSet.csv: Defines the main structure of the thesaurus set. It has a row for each thesaurus and seven columns (ThesaurusName, FileName, CaseSensitive, AccentSensitive, PunctuationSensitive, ApplyToColNames, and ApplyToColValues). Their meaning coincides with the description above. Observe that the case, accent, and punctuation sensitiveness is stored here, instead of in each thesaurus.
identifierThesaurus.csv: Thesaurus for the identifiers used in LogRatios to identify the bone types and the measure names in the data and the references. It has for columns: Taxon, Element, Measure, and Standard.
taxonThesaurus.csv: Thesaurus for the taxa. There is one column for each category of taxon considered.
elementThesaurus.csv: Thesaurus for the skeletal elements. One column for each category.
measureThesaurus.csv: Thesaurus for the measure names. One column for each category.

Examples

## List of thesaurus names and characteristics in the thesaurus set:
attributes(zoologThesaurus)
#> $names
#> [1] "identifier" "taxon"      "element"    "measure"   
#> 
#> $applyToColNames
#> [1]  TRUE FALSE FALSE  TRUE
#> 
#> $applyToColValues
#> [1] FALSE  TRUE  TRUE  TRUE
#> 
#> $fileName
#> [1] "identifierThesaurus.csv" "taxonThesaurus.csv"     
#> [3] "elementThesaurus.csv"    "measureThesaurus.csv"   
#> 
## Content of the first thesaurus:
zoologThesaurus$identifier
#>          Taxon         Element      Measure Standard
#> 1        Taxon         Element      Measure Standard
#> 2          TAX              EL  measurement estandar
#> 3      species            anat         Mass         
#> 4       animal            bone measurements         
#> 5       Specie            Osso     measures         
#> 6 GenusSpecies BoneBoneElement       Medida         
#> 7      Especie     Skelettteil         Mida         
#> 8       Espece        elemento                      
#> 9                           Os                      
attributes(zoologThesaurus$identifier)
#> $names
#> [1] "Taxon"    "Element"  "Measure"  "Standard"
#> 
#> $class
#> [1] "data.frame"
#> 
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9
#> 
#> $caseSensitive
#> [1] FALSE
#> 
#> $accentSensitive
#> [1] FALSE
#> 
#> $punctuationSensitive
#> [1] FALSE
#>