StandardizeNomenclature.RdFunctions to map the user provided nomenclature into a standard one as defined in a thesaurus.
StandardizeNomenclature(x, thesaurus, mark.unknown = FALSE)
StandardizeDataSet(data, thesaurusSet = zoologThesaurus)Character vector.
A thesaurus object.
Logical. If FALSE (default) the strings not found in the
thesaurus are kept without change. If TRUE the strings not in the
thesaurus are set to NA.
A data frame.
A thesaurus set.
StandardizeNomenclature returns a vector of the same length as the
input vector x. The names present in the thesaurus are set to their
corresponding category. The names not in the thesaurus are kept unchanged if
mark.unknown=FALSE (default) and set to NA if
mark.unknown=TRUE.
StandardizeDataSet returns a data frame with the same structure as
the input data, but standardizing its nomenclature according to a thesaurus set
including appropriate thesauri for its column names and for the values of
a set of columns.
StandardizeNomenclature standardizes a character vector
according to a given thesaurus.
StandardizeDataSet standardizes column names and values of
a data frame according to a thesaurus set.
zoologThesaurus for a description of the thesaurus and
thesaurus set structure,
## Select the thesaurus for taxa present in the thesaurus set
## zoolog::zoologThesaurus:
thesaurus <- zoologThesaurus$taxon
thesaurus
#> Bos taurus Bos primigenius Bos Ovis aries Ovis orientalis
#> 1 Bos taurus Bos primigenius Bos Ovis aries Ovis orientalis
#> 2 bota BP BoSP ovar ovis musimon
#> 3 cattle bos prim B sheep muflon
#> 4 BT auroch OA mouflon
#> 5 bovino urus oveja
#> 6 Grands Bovides Boeuf ure Ovicaprines Mouton
#> 7 vaca uro ov
#> 8 auroque ovella
#> 9 OVA
#> 10
#> 11
#> 12
#> Ovis Capra hircus Capra aegagrus Capra Caprini
#> 1 Ovis Capra hircus Capra aegagrus Capra Caprini
#> 2 OvSP cahi wild goat CaSP Ovis/Capra
#> 3 goat cabra salvatge oc
#> 4 CH cabra selvagem caprine
#> 5 cabra cabra salvaje s/g
#> 6 pecora chevre sauvage sheep/goat
#> 7 cabra Sh/G
#> 8 CAH ovicaprino
#> 9 Ovicaprines Chevre O/C
#> 10 Ovicaprines Ovis-Capra
#> 11 ovicapri
#> 12 O
#> Sus domesticus Sus scrofa Sus Cervus elaphus Cervus Dama mesopotamica
#> 1 Sus domesticus Sus scrofa Sus Cervus elaphus Cervus Dama mesopotamica
#> 2 sudo wild boar SuSP ceel Persian fallow deer
#> 3 pig senglar red deer
#> 4 SS porc senglar CE
#> 5 cerdo jabali ciervo
#> 6 Suides Porc javali Cervides Cerf
#> 7 porc sanglier CEE
#> 8 Suides wss cervol
#> 9 susc
#> 10
#> 11
#> 12
#> Dama Gazella gazella Gazella Equus asinus
#> 1 Dama Gazella gazella Gazella Equus asinus
#> 2 fallow deer mountain gazelle gazelle Equus africanus asinus
#> 3 E asinus
#> 4 donkey
#> 5 ass
#> 6 asno
#> 7 ase
#> 8 ane
#> 9 eqas
#> 10
#> 11
#> 12
#> Equus caballus Equus Oryctolagus cuniculus Oryctolagus
#> 1 Equus caballus Equus Oryctolagus cuniculus Oryctolagus
#> 2 Equus ferus caballus EqSP O cuniculus oryct
#> 3 E caballus EquSP European rabbit
#> 4 horse rabbit
#> 5 caballo conejo
#> 6 cavall conill
#> 7 cheval lapin
#> 8 cavalo coelho
#> 9 eqca orcu
#> 10
#> 11
#> 12
#> Canis familiaris Canis lupus Canis
#> 1 Canis familiaris Canis lupus Canis
#> 2 Canis lupus familiaris grey wolf canine
#> 3 dog wolf
#> 4 domestic dog gray wolf
#> 5 perro lobo
#> 6 gos llop
#> 7 chien loup
#> 8 cao
#> 9
#> 10
#> 11
#> 12
## Standardize an heterodox vector of taxa:
StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"),
thesaurus)
#> [1] "Bos taurus" "giraffe" "Sus domesticus" "Bos taurus"
## Observe that "giraffe" is kept unchanged since it is not included in
## any thesaurus category.
## But if mark.unknown is set to TRUE, it is marked as NA:
StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"),
thesaurus, mark.unknown = TRUE)
#> [1] "Bos taurus" NA "Sus domesticus" "Bos taurus"
## This thesaurus is not case sensitive:
attr(thesaurus, "caseSensitive") # == FALSE
#> [1] FALSE
## Thus, names are recognized independently of their case:
StandardizeNomenclature(c("bota", "BOTA", "Bota", "boTa"),
thesaurus)
#> [1] "Bos taurus" "Bos taurus" "Bos taurus" "Bos taurus"
## Load an example data frame:
dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz",
package = "zoolog")
dataExample <- utils::read.csv2(dataFile,
na.strings = "",
encoding = "UTF-8")
## Observe mainly the first columns:
head(dataExample[,1:5])
#> Site N.inv UE Especie Os
#> 1 ALP 4918 10364 bota 1 fal
#> 2 ALP 4919 10364 bota 1 fal
#> 3 ALP 3453 10410 ovar 1fal ant
#> 4 ALP 3455 10410 ovar 1fal ant
#> 5 ALP 4245 7036 cahi hum
#> 6 ALP 4674 10227 cahi hum
## Stadardize the dataset:
dataStandardized <- StandardizeDataSet(dataExample, zoologThesaurus)
head(dataStandardized[,1:5])
#> Site N.inv UE Taxon Element
#> 1 ALP 4918 10364 Bos taurus first phalanx
#> 2 ALP 4919 10364 Bos taurus first phalanx
#> 3 ALP 3453 10410 Ovis aries anterior first phalanx
#> 4 ALP 3455 10410 Ovis aries anterior first phalanx
#> 5 ALP 4245 7036 Capra hircus humerus
#> 6 ALP 4674 10227 Capra hircus humerus