The thesaurus set defined for the package zoolog. This is used to make the methods robust to different nomenclatures used in datasets created by different authors. The user can also use other thesaurus sets, or can modify the provided thesaurus set (see ThesaurusManagement and ThesaurusReaderWriter).

zoologThesaurus

Format

A thesaurus set is a list of thesauri with additional attributes:

names

Character vector with the name of each thesaurus.

applyToColNames

Logical vector indicating whether each thesaurus should be applied to the column names of the data frame.

applyToColValues

Logical vector indicating whether each thesaurus should be applied to the values in the corresponding column of the data frame.

filename

Character vector with the source file of each thesaurus.

The examples below show the list of four thesauri included in the provided zoologThesurus.

Each thesaurus is a data frame also with additional attributes. Each column of the data frame is a category of names with equivalent meaning in the intended application. The column name identifies the category and is used as the standard when applying StandardizeNomenclature.

The names in each column (category) must not be included in any other column, since this would make the thesaurus ambiguous (see ThesaurusAmbiguity).

Each thesaurus has the following attributes:

names

The standard name for the categories.

class

"data.frame"

row.names

Irrelevant

caseSensitive

Logical indicating whether the names in the thesaurus should be considered case-sensitive.

accentSensitive

Logical indicating whether the names in the thesaurus should be differentiated by the presence of accent marks.

punctuationSensitive

Logical indicating whether the names in the thesaurus should be differentiated by the presence of punctuation marks.

The examples below show the content and characteristics of the first thesaurus in zoologThesaurus.

File Structure

zoologThesaurus is an exported variable automatically loaded in memory. In addition, the source files generating it are included in the zoolog extdata folder. There is one file for the thesaurus set main structure and one file for each included thesaurus. All of them are in semicolon separated format. Thus, they can be examined in any text editor or imported into any spreadsheet application. The files are:

zoologThesaurusSet.csv

Defines the main structure of the thesaurus set. It has a row for each thesaurus and seven columns (ThesaurusName, FileName, CaseSensitive, AccentSensitive, PunctuationSensitive, ApplyToColNames, and ApplyToColValues). Their meaning coincides with the description above. Observe that the case, accent, and punctuation sensitiveness is stored hear, instead of in each thesaurus.

identifierThesaurus.csv

Thesaurus for the identifiers used in LogRatios to identify the bone types and the measure names in the data and the references. It has for columns: Taxon, Element, Measure, and Standard.

taxonThesaurus.csv

Thesaurus for the taxa. There is one column for each category of taxon considered.

elementThesaurus.csv

Thesaurus for the skeletal elements. One column for each category.

measureThesaurus.csv

Thesaurus for the measure names. One column for each category.

Examples

## List of thesaurus names and characteristics in the thesaurus set: attributes(zoologThesaurus)
#> $names #> [1] "identifier" "taxon" "element" "measure" #> #> $applyToColNames #> [1] TRUE FALSE FALSE TRUE #> #> $applyToColValues #> [1] FALSE TRUE TRUE TRUE #> #> $fileName #> [1] "identifierThesaurus.csv" "taxonThesaurus.csv" #> [3] "elementThesaurus.csv" "measureThesaurus.csv" #>
## Content of the first thesaurus: zoologThesaurus$identifier
#> Taxon Element Measure Standard #> 1 Taxon Element Measure Standard #> 2 TAX EL measurement estandar #> 3 species anat Mass #> 4 animal bone measurements #> 5 Specie Osso measures #> 6 GenusSpecies BoneBoneElement Medida #> 7 Especie Skelettteil Mida #> 8 Espece elemento #> 9 Os
attributes(zoologThesaurus$identifier)
#> $names #> [1] "Taxon" "Element" "Measure" "Standard" #> #> $class #> [1] "data.frame" #> #> $row.names #> [1] 1 2 3 4 5 6 7 8 9 #> #> $caseSensitive #> [1] FALSE #> #> $accentSensitive #> [1] FALSE #> #> $punctuationSensitive #> [1] FALSE #>