data integration
; integrated dataset
; record linkage
; statistical matching
Publication Year
2022
Publisher
Korea Institute for Health and Social Affairs
Abstract
In line with the data open trend, a large amount of data is being opened. However, they are individually distributed and provided so that use of the multiple data is not well linked and integrated among themselves. The purpose of this study is to examine the linkage and integration of multi-source data dispersed in various places in order to increase the utilization value, and to prepare a plan to solve the difficulties and to activate them. Main results of this study are as follows: Data integration and analysis were conducted by using real data. The Korea Health Panel Survey data and Life Time Use Survey data were integrated by using the nearest neighborhood hot deck method based on the exact distance function. The purpose of integrating the two data was to identify the possibility of creating integrated data on multi-source surveys, and to analyze the relationship between health behavior and life time use. Results of the integrated data analysis revealed a statistically significant difference in behavior of life time usage(essential time, duty time, free time, exercise time) by gender and age group. According to subjective health status, there were statistically significant differences in life time use: men and women in their 60s and older for essential time, duty time, free time, men in their 40s and 50s for duty time, free time and exercise time, and women in their 40s and 50s for duty time. The middle-aged usually have health problems such as onset of chronic diseases. This was appeared due to decreased time on work, housework, and other activities for treatment of diseases, or due to restricted activities caused by diseases. Such a group was perceiving to have poor subjective health status due to health problems. On the other hand, it was found that the group with poor subjective health status spent more exercise time for health care. As such, the results were similar to the previous studies on the relationship between subjective health status and life time use. Accordingly, data integration based on statistical matching methods can be regarded as having high data utilization. If more diverse data can be integrated in the future, various information could be used that was insufficient to be considered in the health field, so that various factors can be identified for countermeasures for the people’s health.
Table Of Contents
Abstract 1 요 약 3
제1장 서 론 11 제1절 연구의 배경 및 목적 13 제2절 연구의 내용 및 방법 18
제2장 데이터 연계·통합 개념 및 통계적 방법론 23 제1절 데이터 연계·통합 개념 25 제2절 자료연계 방법론 30 제3절 통계적 매칭 방법론 47
제3장 국내외 데이터 통합 사례 연구 65 제1절 국내외 데이터 통합 현황 67 제2절 국내외 통합데이터 활용 사례 77 제3절 소결 86
제4장 활용 가능한 데이터 현황 89 제1절 우리 연구원에서 제공 중인 조사데이터 91 제2절 국내 데이터 포털 및 플랫폼 현황 131 제3절 소결 141
제5장 데이터 통합 실증 분석 143 제1절 통계적 매칭 방법에 대한 데이터 통합 모의실험 145 제2절 통계적 매칭을 활용한 데이터 통합 157 제3절 소결 191
제6장 결론 199 제1절 연구 결과 요약 및 함의 201 제2절 향후 연구 방향 209
참고문헌 213 부 록 229 부록 1. 블록화 229 부록 2. 우리 연구원 및 보건복지부 국가승인통계 현황 231 부록 3. 조사데이터별 주요 조사 내용 236 부록 4. 민간데이터 기초분석 247 부록 5. 전문가 대상 통합데이터를 활용한 연구 주제 관련 설문조사 결과 258