A.7 Data Type

R에서는 다음과 같은 data type들이 많이 사용된다.

A.7.1 Types of elements of arrays

logical   # TRUE or FALSE, 논리형
integer   # 정수형, not much used
numeric   # 숫자형
character # 문자형
complex   # complex number, 복소수형
raw       # machine code style (rarely used), 기계어형

위의 것들을 element로 하여 다음과 같은 좀 더 복잡한 data type이 있다.

vector      # 1 dimensional array, homogenous
matrix      # 2 dimensional array, homogenous
array       # 3 or more dimensions, homogenous
list        # of heterogenous elements
data.frame  # a kind of list

위의 것은 data type 이름이기도 하지만, 해당 data type을 생성하는 함수(constructor) 이름이기도 하다.

R에서 따로 scalar type이 없다. 즉, 숫자 하나는 길이가 1인 vector로 간주한다.

R에서 vector, matrix, array는 element들이 homogenous하고, list들은 homogenous할 필요가 없다.

범주형 자료는 다음과 같이 factor type으로 표현한다.

factor      # including ordered

factor type은 겉으로는 character type처럼 보이지만, 내부적으로 integer type으로 저장되어 있다.

순서형 자료는 factor type(기본적으로는 명목형 범주형 변수)에 ordered 속성이 부가된 것이지만, 개념적으로는 ordered (순서형 범주형 변수)도 하나의 type처럼 볼 수 있다.

factor type을 생성하는 factor 또는 ordered 라는 함수도 있다.

Data object가 어떤 type인지 확인하는 다음과 같은 함수들이 있다.

is         # What is it?
is.na      # is NA?
is.nan     # is NaN?
is.finite  # is finite number?

is로 시작하는 많은 함수들이 이런 것이다.

Data type을 check하는 것은 아니고, input vector가 정렬되어 있는지 점검하는 함수가 있다.

is.unsorted # Check if it is not sorted

유의할 것은 is.sorted 는 없다는 점이다.

Data type을 어떤 경우에는 변환(cast)할 수도 있는데, 다음과 같은 함수들이 많이 쓰인다.

as.character # 문자형으로
as.numeric   # 숫자형(주로 실수형)으로
as.factor    # nominal categorical, 명목형으로
as.ordered   # ordinal categorical, 순서형으로

내부적으로 logical - integer - real - complex - character의 순서로 복잡해진다. 단순한 것에서 복잡한 것으로는 항상 변환(cast)이 된다.

반대 방향은 그 값에 따라 가능하기도 하고 불가능하기도 하다. 예를 들어, 문자 “1”은 숫자 1로 변환이 가능하다. 하지만, 문자 “L”은 그냥 숫자로 변환이 안된다. (utf8ToInt(“L”)과 같이 하면 숫자로 바뀌는데, 이것은 “L”문자의 utf8 code number이다.)