A few months ago, I started a Research Associate position to work on a project with the ultimate goal of building a digital tool (e.g., an app) to detect the most relevant factors that protect against developing depression in young people. The project has been immensely satisfying at every stage because of the nature of the study, it’s purpose and the richness of the data. As a PhD student whose research is about young populations who are at-risk for developing mental illnesses, I truly believe in early intervention research, which is still a blind spot in mental health care.


Depression is a major contributor to disability across lifespan. It has its onset in adolescence and early adulthood and there are many biological, psychosocial, environmental, and clinical reasons for the onset. The best way, in my opinion, to understand the factors leading to the development of a mental health disorder is working with longitudinal data. That is why, the Avon Longitudinal Study of Parents and Children (ALSPAC) birth cohort’s rich, detailed data regarding the protective and risk factors (i.e., active ingredients) for depression was a great suit for our research. We were able to analyse the data on many different time points and as early as 3 months of age. The depth and volume of the data let us look at many risk and protective factors recommended by Wellcome Trust and by the vast literature on depression such as sleep disturbances, loneliness, parenting, cognitive skills (e.g., attention), school connectedness and enjoyment, friendship, physical activity, IQ, childhood abuse, or diet, among others. This was overall a great opportunity to understand what factors increase or decrease the risk for developing depression in young people later in their lives and to improve primary prevention strategies.


However, I found that there can be also some challenges of using this type of longitudinal cohort data, especially for the first time. One of them is that the data might not answer your specific questions to the degree that you would have hoped, because the data is collected in a specific way to answer other specific questions. For example, in some cases, a single item is used to characterize a specific variable (e.g., religion), which might be not very accurate to capture the complex nature of certain variables. Additionally, since we did not collect the data, we ultimately had no control over what the secondary data set contained or how it was measured. For example, childhood abuse, parenting, infant interactions, and bullying were all reported by the parents, and contrary to our expectations, none of these variables were associated with depression in young people. We believe that this might be partially due to the fact that the questionnaires were solely completed by the parents, instead of by the children. Most likely, parents may tend to underreport the adverse experience that the child was going through, and thus a more accurate approach would have been to ask the child and/or the teachers directly. In fact, when we look at other studies looking at childhood abuse which are child reported, childhood abuse appears to be consistently associated with a range of mental health problems (e.g., depression, borderline personality disorder, psychotic symptoms). Further, some of the factors that we were interested in were measured after the age of 13, which is above the age limit that we set for our study, and for this reason, we were not able to include those variables. Finally, another challenge of using a longitudinal cohort was that the data was very big to handle and comprehend at first, especially because I was not familiar with ALSPAC’s measures (e.g. how they were assessed and scored). Therefore, this required to dedicate few weeks to get familiarized with the dataset and learn in more detail the specific characteristics of this cohort data.


Overall, what I have learned in this process is that although there are some drawbacks on utilising secondary data and might be somehow overwhelming at first if you are not familiarized with the dataset, conducting longitudinal studies with large cohorts are one of the best methods to use when it comes to understand aetiology of a mental health disorder. Using a large data set like ALSPAC created many opportunities for us such as discovering patterns in depression in young people and developing a digital tool. I believe this is one of the many advantages of working with a cohort data of these characteristics. Finally, I also hope that the results can also promote the use of longitudinal data in mental health, by highlighting what training and capacity building can best help ensure that these valuable cohort studies get used as much as possible.


Written by: Buse Durdurak

Research Associate on the PREVENTA study, at the University of Birmingham