TY - JOUR
T1 - A new data science trajectory for analysing multiple studies
T2 - a case study in physical activity research
AU - Tummers, Simone C.M.W.
AU - Hommersom, Arjen
AU - Bolman, Catherine
AU - Lechner, Lilian
AU - Bemelmans, Roger
PY - 2025/6
Y1 - 2025/6
N2 - The analysis of complex mechanisms within population data, and within sub-populations, can be empowered by combining datasets, for example to gain more understanding of change processes of health-related behaviours. Because of the complexity of this kind of research, it is valuable to provide more specific guidelines for such analyses than given in standard data science methodologies. Thereto, we propose a generic procedure for applied data science research in which the data from multiple studies are included. Furthermore, we describe its steps and associated considerations in detail to guide other researchers. Moreover, we illustrate the application of the described steps in our proposed procedure (presented in the graphical abstract) by means of a case study, i.e., a physical activity (PA) intervention study, in which we provided new insights into PA change processes by analyzing an integrated dataset using Bayesian networks. The strengths of our proposed methodology are subsequently illustrated, by comparing this data science trajectories protocol to the classic CRISP-DM procedure. Finally, some possibilities to extend the methodology are discussed.
AB - The analysis of complex mechanisms within population data, and within sub-populations, can be empowered by combining datasets, for example to gain more understanding of change processes of health-related behaviours. Because of the complexity of this kind of research, it is valuable to provide more specific guidelines for such analyses than given in standard data science methodologies. Thereto, we propose a generic procedure for applied data science research in which the data from multiple studies are included. Furthermore, we describe its steps and associated considerations in detail to guide other researchers. Moreover, we illustrate the application of the described steps in our proposed procedure (presented in the graphical abstract) by means of a case study, i.e., a physical activity (PA) intervention study, in which we provided new insights into PA change processes by analyzing an integrated dataset using Bayesian networks. The strengths of our proposed methodology are subsequently illustrated, by comparing this data science trajectories protocol to the classic CRISP-DM procedure. Finally, some possibilities to extend the methodology are discussed.
KW - Applied data science
KW - Data Science Trajectories
KW - Multiple studies
U2 - 10.1016/j.mex.2024.103104
DO - 10.1016/j.mex.2024.103104
M3 - Article
SN - 2215-0161
VL - 14
JO - MethodsX
JF - MethodsX
M1 - 103104
ER -