StandardizeNomenclature.Rd
Functions to map the user provided nomenclature into a standard one as defined in a thesaurus.
StandardizeNomenclature(x, thesaurus, mark.unknown = FALSE)
StandardizeDataSet(data, thesaurusSet = zoologThesaurus)
Character vector.
A thesaurus object.
Logical. If FALSE
(default) the strings not found in the
thesaurus are kept without change. If TRUE
the strings not in the
thesaurus are set to NA
.
A data frame.
A thesaurus set.
StandardizeNomenclature
returns a vector of the same length as the
input vector x
. The names present in the thesaurus are set to their
corresponding category. The names not in the thesaurus are kept unchanged if
mark.unknown=FALSE
(default) and set to NA
if
mark.unknown=TRUE
.
StandardizeDataSet
returns a data frame with the same structure as
the input data
, but standardizing its nomenclature according to a thesaurus set
including appropriate thesauri for its column names and for the values of
a set of columns.
StandardizeNomenclature
standardizes a character vector
according to a given thesaurus.
StandardizeDataSet
standardizes column names and values of
a data frame according to a thesaurus set.
zoologThesaurus
for a description of the thesaurus and
thesaurus set structure,
## Select the thesaurus for taxa present in the thesaurus set
## zoolog::zoologThesaurus:
thesaurus <- zoologThesaurus$taxon
thesaurus
#> Bos taurus Bos primigenius Bos Ovis aries Ovis orientalis
#> 1 Bos taurus Bos primigenius Bos Ovis aries Ovis orientalis
#> 2 bota BP BoSP ovar ovis musimon
#> 3 cattle bos prim B sheep muflon
#> 4 BT auroch OA mouflon
#> 5 bovino urus oveja
#> 6 Grands Bovides Boeuf ure Ovicaprines Mouton
#> 7 vaca uro ov
#> 8 auroque ovella
#> 9 OVA
#> 10
#> 11
#> 12
#> Ovis Capra hircus Capra aegagrus Capra Caprini
#> 1 Ovis Capra hircus Capra aegagrus Capra Caprini
#> 2 OvSP cahi wild goat CaSP Ovis/Capra
#> 3 goat cabra salvatge oc
#> 4 CH cabra selvagem caprine
#> 5 cabra cabra salvaje s/g
#> 6 pecora chevre sauvage sheep/goat
#> 7 cabra Sh/G
#> 8 CAH ovicaprino
#> 9 Ovicaprines Chevre O/C
#> 10 Ovicaprines Ovis-Capra
#> 11 ovicapri
#> 12 O
#> Sus domesticus Sus scrofa Sus Cervus elaphus Cervus Dama mesopotamica
#> 1 Sus domesticus Sus scrofa Sus Cervus elaphus Cervus Dama mesopotamica
#> 2 sudo wild boar SuSP ceel Persian fallow deer
#> 3 pig senglar red deer
#> 4 SS porc senglar CE
#> 5 cerdo jabali ciervo
#> 6 Suides Porc javali Cervides Cerf
#> 7 porc sanglier CEE
#> 8 Suides wss cervol
#> 9 susc
#> 10
#> 11
#> 12
#> Dama Gazella gazella Gazella Equus asinus
#> 1 Dama Gazella gazella Gazella Equus asinus
#> 2 fallow deer mountain gazelle gazelle Equus africanus asinus
#> 3 E asinus
#> 4 donkey
#> 5 ass
#> 6 asno
#> 7 ase
#> 8 ane
#> 9 eqas
#> 10
#> 11
#> 12
#> Equus caballus Equus Oryctolagus cuniculus Oryctolagus
#> 1 Equus caballus Equus Oryctolagus cuniculus Oryctolagus
#> 2 Equus ferus caballus EqSP O cuniculus oryct
#> 3 E caballus EquSP European rabbit
#> 4 horse rabbit
#> 5 caballo conejo
#> 6 cavall conill
#> 7 cheval lapin
#> 8 cavalo coelho
#> 9 eqca orcu
#> 10
#> 11
#> 12
#> Canis familiaris Canis lupus Canis
#> 1 Canis familiaris Canis lupus Canis
#> 2 Canis lupus familiaris grey wolf canine
#> 3 dog wolf
#> 4 domestic dog gray wolf
#> 5 perro lobo
#> 6 gos llop
#> 7 chien loup
#> 8 cao
#> 9
#> 10
#> 11
#> 12
## Standardize an heterodox vector of taxa:
StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"),
thesaurus)
#> [1] "Bos taurus" "giraffe" "Sus domesticus" "Bos taurus"
## Observe that "giraffe" is kept unchanged since it is not included in
## any thesaurus category.
## But if mark.unknown is set to TRUE, it is marked as NA:
StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"),
thesaurus, mark.unknown = TRUE)
#> [1] "Bos taurus" NA "Sus domesticus" "Bos taurus"
## This thesaurus is not case sensitive:
attr(thesaurus, "caseSensitive") # == FALSE
#> [1] FALSE
## Thus, names are recognized independently of their case:
StandardizeNomenclature(c("bota", "BOTA", "Bota", "boTa"),
thesaurus)
#> [1] "Bos taurus" "Bos taurus" "Bos taurus" "Bos taurus"
## Load an example data frame:
dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz",
package = "zoolog")
dataExample <- utils::read.csv2(dataFile,
na.strings = "",
encoding = "UTF-8")
## Observe mainly the first columns:
head(dataExample[,1:5])
#> Site N.inv UE Especie Os
#> 1 ALP 4918 10364 bota 1 fal
#> 2 ALP 4919 10364 bota 1 fal
#> 3 ALP 3453 10410 ovar 1fal ant
#> 4 ALP 3455 10410 ovar 1fal ant
#> 5 ALP 4245 7036 cahi hum
#> 6 ALP 4674 10227 cahi hum
## Stadardize the dataset:
dataStandardized <- StandardizeDataSet(dataExample, zoologThesaurus)
head(dataStandardized[,1:5])
#> Site N.inv UE Taxon Element
#> 1 ALP 4918 10364 Bos taurus first phalanx
#> 2 ALP 4919 10364 Bos taurus first phalanx
#> 3 ALP 3453 10410 Ovis aries anterior first phalanx
#> 4 ALP 3455 10410 Ovis aries anterior first phalanx
#> 5 ALP 4245 7036 Capra hircus humerus
#> 6 ALP 4674 10227 Capra hircus humerus