Functions to map the user provided nomenclature into a standard one as defined in a thesaurus.

StandardizeNomenclature(x, thesaurus, mark.unknown = FALSE)

StandardizeDataSet(data, thesaurusSet = zoologThesaurus)

Arguments

x

Character vector.

thesaurus

A thesaurus object.

mark.unknown

Logical. If FALSE (default) the strings not found in the thesaurus are kept without change. If TRUE the strings not in the thesaurus are set to NA.

data

A data frame.

thesaurusSet

A thesaurus set.

Value

StandardizeNomenclature returns a vector of the same length as the input vector x. The names present in the thesaurus are set to their corresponding category. The names not in the thesaurus are kept unchanged if mark.unknown=FALSE (default) and set to NA if mark.unknown=TRUE.

StandardizeDataSet returns a data frame with the same structure as the input data, but standardizing its nomenclature according to a thesaurus set including appropriate thesauri for its column names and for the values of a set of columns.

Details

StandardizeNomenclature standardizes a character vector according to a given thesaurus.

StandardizeDataSet standardizes column names and values of a data frame according to a thesaurus set.

See also

zoologThesaurus for a description of the thesaurus and thesaurus set structure,

ThesaurusReaderWriter, ThesaurusManagement

Examples

## Select the thesaurus for taxa present in the thesaurus set
## zoolog::zoologThesaurus:
thesaurus <- zoologThesaurus$taxon
thesaurus
#>              Bos taurus Bos primigenius  Bos         Ovis aries Ovis orientalis
#> 1            Bos taurus Bos primigenius  Bos         Ovis aries Ovis orientalis
#> 2                  bota              BP BoSP               ovar    ovis musimon
#> 3                cattle        bos prim    B              sheep          muflon
#> 4                    BT          auroch                      OA         mouflon
#> 5                bovino            urus                   oveja                
#> 6  Grands Bovides Boeuf             ure      Ovicaprines Mouton                
#> 7                  vaca             uro                      ov                
#> 8                               auroque                  ovella                
#> 9                                                           OVA                
#> 10                                                                             
#> 11                                                                             
#> 12                                                                             
#>    Ovis       Capra hircus  Capra aegagrus Capra                Caprini
#> 1  Ovis       Capra hircus  Capra aegagrus Capra                Caprini
#> 2  OvSP               cahi       wild goat  CaSP             Ovis/Capra
#> 3                     goat  cabra salvatge                           oc
#> 4                       CH cabra selvagem                       caprine
#> 5                    cabra   cabra salvaje                          s/g
#> 6                   pecora  chevre sauvage                   sheep/goat
#> 7                    cabra                                         Sh/G
#> 8                      CAH                                   ovicaprino
#> 9       Ovicaprines Chevre                                          O/C
#> 10                                               Ovicaprines Ovis-Capra
#> 11                                                             ovicapri
#> 12                                                                    O
#>    Sus domesticus   Sus scrofa  Sus Cervus elaphus Cervus   Dama mesopotamica
#> 1  Sus domesticus   Sus scrofa  Sus Cervus elaphus Cervus   Dama mesopotamica
#> 2            sudo    wild boar SuSP           ceel        Persian fallow deer
#> 3             pig      senglar            red deer                           
#> 4              SS porc senglar                  CE                           
#> 5           cerdo       jabali              ciervo                           
#> 6     Suides Porc       javali       Cervides Cerf                           
#> 7            porc     sanglier                 CEE                           
#> 8          Suides          wss              cervol                           
#> 9                         susc                                               
#> 10                                                                           
#> 11                                                                           
#> 12                                                                           
#>           Dama  Gazella gazella Gazella           Equus asinus
#> 1         Dama  Gazella gazella Gazella           Equus asinus
#> 2  fallow deer mountain gazelle gazelle Equus africanus asinus
#> 3                                                     E asinus
#> 4                                                       donkey
#> 5                                                          ass
#> 6                                                         asno
#> 7                                                          ase
#> 8                                                          ane
#> 9                                                         eqas
#> 10                                                            
#> 11                                                            
#> 12                                                            
#>          Equus caballus Equus Oryctolagus cuniculus Oryctolagus
#> 1        Equus caballus Equus Oryctolagus cuniculus Oryctolagus
#> 2  Equus ferus caballus  EqSP           O cuniculus       oryct
#> 3            E caballus EquSP       European rabbit            
#> 4                 horse                      rabbit            
#> 5               caballo                      conejo            
#> 6                cavall                      conill            
#> 7                cheval                       lapin            
#> 8                cavalo                      coelho            
#> 9                  eqca                        orcu            
#> 10                                                             
#> 11                                                             
#> 12                                                             
#>          Canis familiaris Canis lupus  Canis
#> 1        Canis familiaris Canis lupus  Canis
#> 2  Canis lupus familiaris   grey wolf canine
#> 3                     dog        wolf       
#> 4            domestic dog   gray wolf       
#> 5                   perro        lobo       
#> 6                     gos        llop       
#> 7                   chien        loup       
#> 8                     cao                   
#> 9                                           
#> 10                                          
#> 11                                          
#> 12                                          
## Standardize an heterodox vector of taxa:
StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"),
                        thesaurus)
#> [1] "Bos taurus"     "giraffe"        "Sus domesticus" "Bos taurus"    
## Observe that "giraffe" is kept unchanged since it is not included in
## any thesaurus category.
## But if mark.unknown is set to TRUE, it is marked as NA:
StandardizeNomenclature(c("bota", "giraffe", "pig", "cattle"),
                        thesaurus, mark.unknown = TRUE)
#> [1] "Bos taurus"     NA               "Sus domesticus" "Bos taurus"    

## This thesaurus is not case sensitive:
attr(thesaurus, "caseSensitive") #  == FALSE
#> [1] FALSE
## Thus, names are recognized independently of their case:
StandardizeNomenclature(c("bota", "BOTA", "Bota", "boTa"),
                        thesaurus)
#> [1] "Bos taurus" "Bos taurus" "Bos taurus" "Bos taurus"

## Load an example data frame:
dataFile <- system.file("extdata", "dataValenzuelaLamas2008.csv.gz",
                        package = "zoolog")
dataExample <- utils::read.csv2(dataFile,
                                na.strings = "",
                                encoding = "UTF-8")
## Observe mainly the first columns:
head(dataExample[,1:5])
#>   Site N.inv    UE Especie       Os
#> 1  ALP  4918 10364    bota    1 fal
#> 2  ALP  4919 10364    bota    1 fal
#> 3  ALP  3453 10410    ovar 1fal ant
#> 4  ALP  3455 10410    ovar 1fal ant
#> 5  ALP  4245  7036    cahi      hum
#> 6  ALP  4674 10227    cahi      hum
## Stadardize the dataset:
dataStandardized <- StandardizeDataSet(dataExample, zoologThesaurus)
head(dataStandardized[,1:5])
#>   Site N.inv    UE        Taxon                Element
#> 1  ALP  4918 10364   Bos taurus          first phalanx
#> 2  ALP  4919 10364   Bos taurus          first phalanx
#> 3  ALP  3453 10410   Ovis aries anterior first phalanx
#> 4  ALP  3455 10410   Ovis aries anterior first phalanx
#> 5  ALP  4245  7036 Capra hircus                humerus
#> 6  ALP  4674 10227 Capra hircus                humerus