Functions to map the user provided nomenclature into a standard one as defined in a thesaurus.

StandardizeNomenclature(x, thesaurus, mark.unknown = FALSE)

StandardizeDataSet(data, thesaurusSet = zoologThesaurus)

Arguments

x

Character vector.

thesaurus

A thesaurus object.

mark.unknown

Logical. If FALSE (default) the strings not found in the thesaurus are kept without change. If TRUE the strings not in the thesaurus are set to NA.

data

A data frame.

thesaurusSet

A thesaurus set.

Value

StandardizeNomenclature returns a vector of the same length as the input vector x. The names present in the thesaurus are set to their corresponding category. The names not in the thesaurus are kept unchanged if mark.unknown=FALSE (default) and set to NA if mark.unknown=TRUE.

StandardizeDataSet returns a data frame with the same structure as the input data, but standardizing its nomenclature according to a thesaurus set including appropriate thesauri for its column names and for the values of a set of columns.

Details

StandardizeNomenclature standardizes a character vector according to a given thesaurus.

StandardizeDataSet standardizes column names and values of a data frame according to a thesaurus set.

See also

zoologThesaurus for a description of the thesaurus and thesaurus set structure,

ThesaurusReaderWriter, ThesaurusManagement

Examples

## Select the thesaurus for taxa present in the thesaurus set ## zoolog::zoologThesaurus: thesaurus <- zoologThesaurus$taxon thesaurus
#> Bos taurus Bos primigenius Ovis aries Ovis orientalis #> 1 Bos taurus Bos primigenius Ovis aries Ovis orientalis #> 2 bota BP ovar ovis musimon #> 3 bos bos prim ovis muflon #> 4 cattle auroch sheep mouflon #> 5 BT urus OA #> 6 bovino ure oveja #> 7 Grands Bovides Boeuf uro Ovicaprines Mouton #> 8 vaca auroque ov #> 9 B ovella #> 10 OVA #> 11 #> Capra hircus Capra aegagrus Ovis/Capra Sus domesticus #> 1 Capra hircus Capra aegagrus Ovis/Capra Sus domesticus #> 2 cahi wild goat oc sudo #> 3 capra cabra salvatge caprine sus #> 4 goat cabra selvagem s/g pig #> 5 CH cabra salvaje sheep/goat SS #> 6 cabra chevre sauvage Sh/G cerdo #> 7 pecora ovicaprino Suides Porc #> 8 cabra O/C porc #> 9 CAH Ovicaprines Ovis-Capra Suides #> 10 Ovicaprines Chevre ovicapri #> 11 O #> Sus scrofa Cervus elaphus Dama mesopotamica Gazella gazella #> 1 Sus scrofa Cervus elaphus Dama mesopotamica Gazella gazella #> 2 wild boar ceel gazelle #> 3 senglar cervus #> 4 porc senglar red deer #> 5 jabali CE #> 6 javali ciervo #> 7 sanglier Cervides Cerf #> 8 wss CEE #> 9 cervol #> 10 #> 11 #> Equus asinus Equus caballus Oryctolagus cuniculus #> 1 Equus asinus Equus caballus Oryctolagus cuniculus #> 2 Equus africanus asinus Equus ferus caballus O cuniculus #> 3 E asinus E caballus European rabbit #> 4 donkey horse rabbit #> 5 ass caballo conejo #> 6 asno cavall conill #> 7 ase cheval lapin #> 8 ane cavalo coelho #> 9 #> 10 #> 11
## Standardize an heterodox vector of taxa: StandardizeNomenclature(c("bota", "rabbit", "pig", "cattle"), thesaurus)
#> [1] "Bos taurus" "Oryctolagus cuniculus" "Sus domesticus" #> [4] "Bos taurus"
## Observe that "rabbit" is kept unchanged since it is not included in ## any thesaurus category. ## But if mark.unknown is set to TRUE, it is marked as NA: StandardizeNomenclature(c("bota", "rabbit", "pig", "cattle"), thesaurus, mark.unknown = TRUE)
#> [1] "Bos taurus" "Oryctolagus cuniculus" "Sus domesticus" #> [4] "Bos taurus"
## This thesaurus is not case sensitive: attr(thesaurus, "caseSensitive") # == FALSE
#> [1] FALSE
## Thus, names are recognized independently of their case: StandardizeNomenclature(c("bota", "BOTA", "Bota", "boTa"), thesaurus)
#> [1] "Bos taurus" "Bos taurus" "Bos taurus" "Bos taurus"
## Load an example data frame: dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz", package = "zoolog") dataExample <- utils::read.csv2(dataFile, na.strings = "", encoding = "UTF-8", stringsAsFactors = TRUE) ## Observe mainly the first columns: head(dataExample[,1:5])
#> Site N.inv UE Especie Os #> 1 ALP 4918 10364 bota 1 fal #> 2 ALP 4919 10364 bota 1 fal #> 3 ALP 3453 10410 ovar 1fal ant #> 4 ALP 3455 10410 ovar 1fal ant #> 5 ALP 4245 7036 cahi hum #> 6 ALP 4674 10227 cahi hum
## Stadardize the dataset: dataStandardized <- StandardizeDataSet(dataExample, zoologThesaurus) head(dataStandardized[,1:5])
#> Site N.inv UE Taxon Element #> 1 ALP 4918 10364 Bos taurus first phalanx #> 2 ALP 4919 10364 Bos taurus first phalanx #> 3 ALP 3453 10410 Ovis aries anterior first phalanx #> 4 ALP 3455 10410 Ovis aries anterior first phalanx #> 5 ALP 4245 7036 Capra hircus humerus #> 6 ALP 4674 10227 Capra hircus humerus