Introduction: This study examined the long-term effectiveness of cognitive behavioral therapy (CBT) (≥ 2 years after the end of therapy) in the routine care of youth (mean 11.95 years; SD = 3.04 years) with primary anxiety disorder (AD). Methods: Two hundred and ten children with any AD as a primary diagnosis and with any comorbidity were included in the “Kids Beating Anxiety (KibA)” clinical trial and received evidence-based CBT. Diagnoses, severity of diagnoses, and further dimensional outcome variables of symptoms and functioning were assessed before (baseline), after the last treatment session (POST), and at two follow-up (FU) assessments in the child and caregiver report: 6 months (6MONTHS-FU) and >2 years (mean 4.31; SD = 1.07 years) after the last treatment session (long-term FU). Results: At POST, 61.38% showed total remission of all and any ADs. At long-term FU, the remission rate was 63.64%. Compared to baseline, ratings of severity, anxiety, impairment/burden, and life quality improved significantly after CBT in child and caregiver report. All pre-post/FU improvements and global success ratings were stable in child (Pre-Post: Hedges’ g = 3.57; Pre-6MONTHS-FU: Hedges’ g = 3.43; Pre-LT-FU: Hedges’ g = 2.34) and caregiver report (Pre-Post: Hedges’ g = 2.00; Pre-6MONTHS-FU: Hedges’ g = 2.31; Pre-LT-FU: Hedges’ g = 2.31) across all POST- and FU-assessment points. Some outcomes showed further significant improvement, and no deterioration was found over the course of time. Effect sizes calculated in the present study correspond to, or even exceed, effect sizes reported in previous meta-analysis. Conclusions: Stable long-term effects of “KibA” CBT for youth with ADs, comparable to those results from efficacy studies, were achieved in a routine practice setting by applying treatment manuals tested in randomized controlled trials. These findings are remarkable, as the patient group studied here consisted of an age group within the main risk phase of developing further mental disorders, and therefore an increase in new-onset anxiety and further mental disorders would be expected over the long time span studied here.

In childhood and adolescence, anxiety disorders (ADs) are the earliest [1] and most common mental disorders [2], with an average point-to-one-year prevalence rate of 6.5% worldwide [3]. ADs have a median onset age of 6 years [4] and are associated with high levels of burden, usually for multiple family members [5]. As ADs are associated with lower psychosocial functioning, they are considered pacemakers for further psychopathological development [6‒8], predict adult mental disorders [4, 9], and are thought to be associated with greater costs than any other mental health disorder [10, 11]. All of this makes early and long-term effective interventions for childhood ADs critical from both a developmental psychopathology and health economics perspective.

Initial meta-analyses of 24 [12] and 48 [13] randomized controlled trials (RCTs), which either exclusively [12] or mostly [13], investigated cognitive behavioral therapy (CBT) as an active condition, reported average remission rates between 56% and 69%, and mean pre-post effect sizes of 0.58 (intent-to-treat, ITT) and 0.86 (completers). With a mean overall pre-post-treatment effect size across all treatments of d = 0.86, CBT for ADs was clearly superior to control conditions (d = 0.13 for the waiting list control group) [12]. James et al. [14] examined 87 trials involving 5,964 children and adolescents and reconfirmed that CBT leads to greater remission of primary and all ADs than waitlist/no treatment in the short term, with remission rates (ITT analyses) of 49.4% for primary AD for CBT versus 17.8% for waitlist/no treatment controls (OR 5.45, 95%, CI 3.90–7.60). Only four studies included in this meta-analysis reported FU data, with mixed results providing evidence for the stability of remission rates at 6-month FUs but not for 12-month FUs.

A systematic qualitative analysis of the long-term effects of CBT-RCTs in adolescents with AD was provided by Gibby and colleagues [15], who naturalistically followed up patients with AD from RCTs over an average of 6 years. They concluded that approximately 50% of adolescents treated for AD were free of their disorder after this long FU period. However, representativeness of the studied long-term FU samples studied was limited, as they primarily included only completers. Thus, these results reflect more of a “best-case scenario.”

The Child/Adolescent Anxiety Multimodal Extended Long-Term Study (CAMELS) [16] was the first study to systematically examine the long-term effects of CBT in a large multi-center RCT (n = 319 youths, 65.3% of the original RCT sample, age range 10.9–25.2 years). This methodologically sophisticated study used independent evaluators over a 4-year period, beginning four to 12 years after randomization, and within-person analyses and multiple-time-point assessments. CBT success was found to be maintained for several years after treatment, but only 22% of adolescents were in stable remission at long-term FU and 48% were relapsers, while 30% were chronically ill [16].

While RCTs in so-called “efficacy trials” are the gold standard for evaluating the effects of treatments under ideal, highly controlled circumstances (i.e., limited diagnoses and highly monitored treatment adherence), “effectiveness trials,” in contrast, measure the degree of beneficial effect in “real-world” clinical settings. Distinguishing between “efficacy” and “effectiveness” contributes an important aspect analyzing clinical evidence. For childhood ADs, there is an urgent need for more “effectiveness studies” examining whether efficacious treatments in RCTs can be successfully applied to routine care. Efficacy and effectiveness studies exist on a continuum [17]. The pragmatic explanatory continuum indicator summary (PRECIS-2) tool proposes a dimensional mapping of treatment studies on 9 relevant dimensions. PRECIS thus allows the continuum of efficacy and effectiveness studies to be mapped multidimensionally in all its complexity [18, 19].

Similar to efficacy trials, effectiveness studies can examine short-term (≤2 years) and long-term FU (>2 years) effects. Several short-term FU effectiveness studies yielded recovery rates comparable to those reported in the meta-analyses of RCTs described above [20]. However, there is considerable methodological heterogeneity that limits the conclusions of these findings. Few studies follow standardized treatment protocols and therefore do not guarantee that evidence-based, guideline-compliant therapy has been implemented [21], therapist training in applied treatment varies widely, studies were conducted with small samples [21] or subclinical anxiety [22], and few studies combine dimensional measures of symptoms and functioning with categorical measures using structured diagnostic interviews [23]. Similar to efficacy studies, effectiveness studies usually follow only those patients who completed the treatment [23]. But most significantly, few studies cover longer FU intervals (>2 years). One of the largest long-term effectiveness studies (n = 139) showed that after an FU period of 4 years, 63% of patients no longer met criteria for the primary diagnosis and 53% no longer met criteria for any of the inclusion diagnoses [24]. Outcomes were improved at nearly 4 years post-treatment, and recovery rates at long-term FU were similar to those reached in efficacy trials. As this study features notable strengths, functional measures were not included in the outcomes. Overall, the first long-term FUs showed clear evidence of the long-term effectiveness of CBT for ADs in youth. However, several limitations reduce our knowledge regarding the maintenance of treatment gains. To summarize, after more than two decades of RCTs for ADs, studies have mounted impressive evidence for short-term and initial convincing evidence for long-term (>2 years post-treatment) efficacy of CBT. The question of whether the long-term success of CBT in RCTs translates fully to routine care still remains. The present study aimed to close this gap by evaluating the long-term effectiveness of CBT for mixed ADs in youth in a naturalistic setting. Thereby, our study aimed to address significant methodological weaknesses of existing studies in the following manner: Not only completers, but the whole sample will be followed at long-term FU, using a graduated, multi-dimensional, multi-informant approach to collect categorial and dimensional data on symptoms and psychosocial functioning from the child, caregiver, and clinician (see OSM). Categorial diagnostics will assess the remission status of all ADs, not only the primary AD, and will be conducted by certified diagnosticians, blinded to the individual’s prior diagnoses and treatment status. Treatment follows well-established, standardized CBT protocols. Further mental health service use after treatment termination is partly standardized (booster-session protocol). It is expected that treatment success will be stable over time, independent of diagnosis and person-related characteristics.

Study Design

The present study was conducted as a “non-controlled, observational trial” (no wait-list and no control group) at the Mental Health Research and Treatment Center (MHRTC) of the Ruhr University Bochum, Germany. Based on the PRECIS approach [18, 19], the current study qualifies as an effectiveness study under the following criteria: (1) lack of further inclusion/exclusion criteria, (2) treatment under routine conditions, (3) large number of practitioners (N = 54) with different treatment experience, (4) inclusion of functional outcome measures, and (5) use of an intent-to-treat analysis [18, 19].

The study was reviewed and approved by the Local Medical Research Ethics Committee (ClinicalTrials.gov ID NCT02077205). We used a multi-informant, multi-dimensional approach, including reports from children, caregivers, and blinded diagnosticians. Trained and certified diagnosticians obtained diagnoses via a structured clinical diagnostic interview (Kinder-DIPS-OA) [25], conducting both the caregiver and child version, and under the supervision of experienced and licensed supervisors at baseline, after the last treatment session (POST), and 6 months later (6MONTHS-FU). At long-term follow-up (long-term FU, >2 years up to 7 years after POST assessment), only the AD-Section of Kinder-DIPS was conducted due to feasibility and time constraints. Severity of diagnoses was rated on an 8-point scale with a severity ≥4 judged as clinically relevant at all assessment time points. Severity of current anxiety symptoms was assessed via caregivers’ and children’s ratings using the German version of the “Spence Children’s Anxiety Scale (SCAS)” [26]. Caregivers and children completed a German-adapted version of the “Sheehan Disability Scale (SDS)” [27] to assess burden (“Belastungsrating [BEL]”) [28] and functional impairment (“Beeinträchtigungsrating [BEE]”) due to anxiety in school, family, and social life [28]. The “Strength and Difficulties Questionnaire (SDQ)” [29] supplemented the diagnostics in all children’s and caregivers’ assessments. Finally, the following instruments were also added to post-treatment assessments: “Clinical Global Impressions (CGI)” [30] at POST, 6MONTHS-FU, and long-term FU and “Treatment Assessment Questionnaire” (“Fragebogen zur Beurteilung der Behandlung [FBB]”) [31] at POST. Data collection to investigate long-term effects of the treatment (long-term FU) covered a range of 2.67–7.26 years (mean 4.3 years; SD = 1.07 years) from end of therapy.

Treatment and Therapists

CBT was carried out between January 2011 and the end of 2015. Treatment was provided by therapists in post-graduate psychotherapy training. Therapists (N = 54) had at least a master’s degree and a minimum of 1-year post-graduate full-time training in CBT, which included a minimum standard 2-day workshop in anxiety CBT, including training in using the applied manuals [32, 33]. Therapists were instructed to conduct manual-based CBT to ensure that guideline-oriented, evidence-based CBT was implemented. Supervision by licensed supervisors took place at least every 4th session to ensure treatment integrity. Children received either family-based CBT (Training for Separation Anxiety for Families [TAFF], 16 sessions, [32]), for children with a primary diagnosis of Separation AD (SepAD) (n = 31; 14.6%), or child-focused CBT (Coping Cat [CC], 17 sessions, [33]), for children with any other primary AD. See online supplementary material at https://doi.org/10.1159/000537932 for details about both treatments.

If the diagnostic interview [25] still showed a clinically relevant AD (severity level ≥4) at regular treatment termination (POST), families could receive up to 3 additional sets of a maximum of 4 booster sessions each (for a total of 12 booster sessions maximum) after a therapy break of 4 weeks between sets. Through these booster sessions, the manual content of the previous treatment was deepened according to the individual needs of the patient. In addition, therapy could be continued to address comorbidity after the termination of AD treatment.

Recruitment

Participants were recruited from local health centers, youth welfare organizations, and other psychological services, including inpatient settings. Further, newspaper advertisements and flyers (displayed and posted in public institutions, schools, and on public transportation) were used to announce the treatment offer. To participate, children had to meet diagnostic criteria for any AD – exclusive of OCD and PTSD – according to the Diagnostic and Statistical Manual for Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR) [34], with a severity level of ≥4, and any comorbidity. Further inclusion criteria were age between 6 and 16 years, sufficient knowledge of the German language, absence of cognitive impairment, and the written informed consent of the caregivers and the child. Figure 1 shows the flow chart of participation according to CONSORT guidelines [35, 36]. After the last treatment session, all patients and their caregivers were invited to POST; all patients enrolled at baseline were invited to 6MONTHS-FU and long-term FU assessment sessions together with their caregivers, using a graduated assessment approach that included first obtaining at least CGI, BEE, and BEL, and then progressing to obtaining up to all of the measures. At all assessment timepoints, diagnosticians attempted to obtain data from patients and from caregivers. Attempts were not successful for all patients and all assessments, such that in some cases, only partial data are available. Loss of participants occurred because of active refusal to reassess or because they could not be reached or did not respond to repeated postal contact. See online supplementary material for details about staging methods of recruitment.

Fig. 1.

CONSORT participation flowchart.

Fig. 1.

CONSORT participation flowchart.

Close modal

Measures

Primary Outcomes

Kinder-DIPS-OA, Child and Caregiver Version. Primary outcome measures consisted of clinical diagnoses via Kinder-DIPS-OA, Child and Caregiver Version [25]. Test-retest reliability (κ = 0.85–0.94; all DSM-IV diagnoses) and validity in past research are good [36], as are inter-rater reliability estimates overall diagnoses of an AD (κ = 0.85) and other axis I disorders (κ = 0.85–0.94) [37, 38]. The treatment effects were examined in terms of remission of primary AD and total remission of any AD. Severity of diagnoses was rated on an 8-point scale with a severity ≥4 judged as clinically relevant at all assessments. Moreover, Kinder-DIPS-OA [25] was used to monitor side effects (e.g., suicidality and other problems) at all assessments.

Secondary Outcomes

The extent of anxiety symptoms was measured via the “Spence Children’s Anxiety Scale (SCAS)” [39, 40]. The “Strengths and Difficulties Questionnaire (SDQ)” [29], parent- and self-report (from the age of 11), was used to assess the patients’ general behavioral strengths and difficulties. Only the SDQ total problem scale achieved satisfactory internal consistency [39], which is why we included only the SDQ total problem score [41]. Caregivers and children completed a German version of the “Sheehan Disability Scale” [27] to assess burden (“Belastungsrating [BEL]”) and functional impairment (“Beeinträchtigungsrating [BEE]”) due to anxiety in school, family, and social life [28]. Improvement or deterioration in global functioning was assessed using the “Clinical Global Impressions Scale (CGI)” [30]. The short version of the “Treatment Assessment Questionnaire” (“Fragebogen zur Beurteilung der Behandlung [FBB]”) [31, 42] was used to record the subjective quality of care directly after the end of therapy in children’s as well as caregivers’ assessments. More details on secondary outcome measures can be found in the online supplementary material.

Statistical Analyses

Data were analyzed in several steps. First, descriptive statistics were used to describe participant demographics, primary diagnosis and comorbidity, information about treatment (number of therapy sessions up to and after post-treatment assessment), and intake of medication during the therapy period. Second, an examination was made of long-term FU-Participation as a factor of severity of primary diagnoses, severity of anxiety (SCAS) [39], impairment and burden (BEE, BEL) [28], strengths and difficulties (SDQ-sum-score) at baseline [29], and CGI [30] at post via t tests. Third, effectiveness of CBT intervention was investigated after treatment termination (POST) and at FUs. For this, absolute and relative frequencies of patients with remission of their primary AD and those with total remission in the absence of any ADs were calculated. We analyzed the severity of the primary and any other anxiety diagnoses, separated for child and caregiver reports across all assessments. Further, we examined changes in secondary outcomes via questionnaires over all assessments using descriptive statistics (mean; SD). A series of mixed-effect models for unbalanced repeated measures were conducted to examine the difference between the primary (severity) and secondary outcome measurements (CGI, BEE, BEL, SCAS, SDQ) across four timepoints: baseline (as reference for BEE, BEL, SCAS, SDQ), POST (as reference for CGI), 6MONTHS-FU, and long-term FU while controlling for age and gender. Mixed models were fitted using REML and p values were derived using the Satterthwaite approximations. A p value less than 0.05 is considered statistically significant. All analyses were conducted with SPSS. 27. Effectiveness of CBT-intervention after termination of treatment (POST), 6MONTHS-FU and long-term FU was analyzed using effect sizes (Hedges’ g) for SCAS (child and caregiver report), to quantify the magnitude of change from baseline-to-POST and baseline-to-FU assessments (6MONTHS-FU and LT-FU).

Individual/Group Statistics

Two hundred and ten children (117 girls = 55.70%) with a mean age of 11.94 years (SD = 3.04) who were diagnosed for any AD as the primary diagnosis according to child (n = 204) or caregiver report (n = 206) at the start of treatment were included in the analyses. At baseline, most patients presented with specific phobia as their primary diagnosis (child report: n = 82, 40.20%; caregiver report: n = 71, 34.47%), followed by social phobia (child report: n = 50, 24.15%; caregiver report: n = 58, 28.16%), separation AD (child report: n = 30, 14.71%; caregiver report: n = 42, 20.39%), or a general AD (child report: n = 16, 7.84%; caregiver report: n = 23, 11.17%). Seventy-nine patients, according to the caregiver version of Kinder-DIPS (37.62%), and 56 patients, according to the child version of Kinder-DIPS (26.67%), out of 210 patients (100%) initially showed comorbidity (one up to four comorbid diagnoses). For more details about comorbidity, see online supplementary material.

148 of these children (70.48%) did not take any medication during the therapy period; 26 of them (12.38%) did not answer this question; 12 children (5.7%) took psychotropic drugs; and 24 children (11.43%) took different kinds of medication to treat something other than ADs (e.g., to treat asthmatic conditions). Find more details on medication during CBT in the online supplementary material.

74.7% of caregivers were married, 77% of mothers, and 94.4% of fathers were employed. No significant influences were found in the severity of the primary diagnosis at baseline from the gender or age of children, marital status, or occupation of caregivers. 65 out of 210 families (30.95%) received additional booster sessions after regular treatment termination. Booster sessions (mean 8; median 5 sessions) addressed further anxiety (1 up to 12 sessions) or comorbidity (up to 41 sessions). Find more details about booster sessions after regular CBT in the online supplementary material.

FU Sample: Group Differences at Baseline and in Long-Term FU

No significant group differences in severity of primary diagnoses at baseline, nor for presence/absence of comorbidity (assessed with Kinder-DIPS), nor in severity of anxiety (SCAS), impairment and burden (BEE, BEL), or child report of strengths and difficulties (SDQ) at baseline were found between those who participated and those who did not participate in long-term FU (all p values >0.05). Analyzing the caregiver report of strengths and difficulties (SDQ), we found a significant difference in participation in long-term FU (p = 0.001, smaller than Bonferroni adjusted alpha level = 0.05/10): Participants in long-term FU indicated lower SDQ-sum scores in caregiver reports (mean 15.75; SD 3.98) at baseline than those who did not participate in long-term FU (mean 17.73; SD 3.63), with a medium effect size of Hedges’ g correction = 0.516. Participants in long-term FU showed higher CGI scores at POST in child report (mean 6.25; SD 0.77) than those who refused (mean 5.99; SD 0.86) with a small effect size of Hedges’ correction = 0.322 (p = 0.045). Since there were no systematic differences between long-term FU participants and those who refused, but only isolated differences that did not occur consistently in the child and caregiver reports, these can be neglected. Since t tests for CGI scores at POST were not using the same sample as variables at baseline, no Bonferroni adjustment of alpha was necessary. At long-term FU, participants were between 9.81 and 23.95 years old (mean 16.57; SD = 2.90 years).

Short- and Long-Term Effectiveness of Treatment: Primary Outcomes

Compared to baseline (210 children = 100%), 73.79% of the children (107 of 145) were free of their primary AD (severity <4) after treatment termination (POST). The proportion of children free of their primary AD further increased at 6MONTHS-FU (79.76%; n = 67 of 84 children). At long-term FU, 77.27% (n = 34 of 44 children) were free of their primary anxiety diagnosis. No differences in distribution of diagnoses (yes/no) were found between post and long-term FU, as demonstrated by χ2-test (0.073, p = 0.568, no significance). Stability of therapy success was shown by analyzing total remission rates: At treatment termination, 61.38% children were free of any AD (n = 89 of 145 children). At the 6MONTHS-FU, the proportion of children free of any AD was 75.00% (n = 63 of 84 children), and at the long-term FU, it was 63.64% (n = 28 of 44 children).

A descriptive examination of changes in severity of primary ADs across all assessments (Fig. 2) shows a clear reduction from baseline (mean 6.20; SD 1.02) to post-treatment assessment (mean 1.84; SD 2.15; with Hedges’ g of 2.00), as well as a significant further decrease at 6MONTHS-FU (mean 1.3; SD 2.23 with Hedges’ g of 2.31) and at long-term FU (mean 1.0; SD 1.91 with Hedges’ g of 2.31) in the caregiver report. For the child report, analyses indicated a significant reduction in severity from baseline (mean 6.19; SD 1.00) to post (mean 0.44; SD 1.30 with Hedges’ g of 3.57 in child and 2.00 in caregiver report). Compared to pre-treatment assessment, there is still a decrease in severity at 6MONTHS-FU (mean 0.52; SD 1.39 with Hedges’ g of 3.44) and long-term FU (mean 0.95; SD 1.91 with Hedges’ g of 2.34). According to the mixed model results, all comparisons between severity at baseline and POST/6MONTHS-FU/long-term FU are significant for both child and caregiver reports and thus the treatment success in the primary outcome measure is confirmed and remains stable over time; no deterioration was found.

Fig. 2.

a–f Means (SDs) of primary and secondary outcomes in child and caregiver ratings for baseline, POST, 6MONTHS-FU, and LT-FU.

Fig. 2.

a–f Means (SDs) of primary and secondary outcomes in child and caregiver ratings for baseline, POST, 6MONTHS-FU, and LT-FU.

Close modal

Short- and Long-Term Effectiveness of Treatment: Secondary Outcomes

An analysis of CGI scores as a global measure of therapy success indicates therapy effects remain stable over time within mixed models for both caregiver and child report (Fig. 2). No significant differences among POST, 6MONTHS-FU and long-term FU were found for CGI (child and caregiver report). Analyses of the severity of anxiety (SCAS) [39], the impairment and burden ratings (BEE, BEL) [28], and SDQ [29] child and caregiver reports yield mixed model results showing significant differences in all secondary outcome measures between baseline and POST/6MONTHS-FU/long-term FU. These results underline the stability of effects in all measurements of symptoms and functioning over time; no deterioration was found. For the FU period, even further improvements were indicated in the available data. An overview of the mixed model results is displayed in Figure 2 and in online supplementary Table 1. The caregivers’ therapy evaluation (FBB) indicated mean of 3.42 (Min = 1.91; Max = 4.0; SD 0.46), and children had a mean of 3.57 (Min = 2.14; Max = 4.0; SD 0.46), indicating both caregivers and children reported high satisfaction with therapy.

Using a large community sample (n = 210), the present study demonstrated the short-term as well as the long-term success of manualized, evidence-based CBT for youth with mixed ADs and comorbidity in routine care, using a multi-informant, multi-dimensional approach. All patients included in the study were invited to participate in the long-term FU session, rather than only completers, as in past studies. In contrast to other studies, we investigated differences at baseline between subgroups of those who participated in long-term FU and those who declined. No systematic group differences were detected in the severity of diagnoses, the presence/absence of comorbidity, or in severity of anxiety (SCAS), impairment, and burden (BEE, BEL) at baseline (all p > 0.05), which supports the long-term effects of treatment and generalizability of findings. Importantly, in contrast to earlier research, this study wanted to overcome the weakness of previous pragmatic studies, which did not ensure that the evidence-based interventions tested in RCTs were actually implemented. Therefore, we trained the therapists in evidence-based manuals before the trial started. Therapy was conducted by therapists-in-training with different amount of experience in CBT in general as well as in the applied manuals. Regular supervision ensured adherence to the treatment. Not only was remission of the primary AD taken into account, but also total remission of any and all ADs. Moreover, significant improvements in secondary outcomes, including ratings of anxiety and other symptoms and areas of functioning (SCAS, SDQ, BEE, BEL, CGI) in child and caregiver reports, illustrate the improved psychosocial functioning and reduced impairment and burden after receiving evidence-based CBT in routine care, even years after treatment termination (>2 up to 7 years). Patients and caregivers agreed in their generally positive evaluation of the CBT (FBB). As agreement between caregiver and child reports generally tends to be low, and we know that caregivers tend to underestimate burden and impairment of their children’s anxiety [43], the consistent pattern of effects among raters in the present study is notable. Other strengths of the study include (1) the sample size at baseline (n = 210), (2) the heterogeneity of the sample (including any ADs and any comorbidity, wide age range), and (3) the great time span covered with the LT-FU (2.67 up to 7.26 years; mean 4.31; SD 1.07). Remission rates in the present study, with 61.38% (POST), 75% (6MONTHS-FU), and 63.64% (long-term FU), are close to the findings from meta-analyses based on the results of RCTs [14]. Comparing the pre-post/LT-FU effect sizes (Hedges’ g) for SCAS scores, calculated separately for child and caregiver reports, with the effect sizes reported in the meta-analysis [14] as a benchmark [44], the recent findings are consistent. Comparing our findings with similar studies conducted with adult populations with a variety of mental disorders [45] or adult samples with specific ADs [46, 47], effect sizes and remission rates are consistent or even exceed, which is remarkable, as the patient group with mixed ADs and comorbidity studied here consisted of an age group within the main risk phase of developing further mental disorders, and therefore an increase in new-onset anxiety and further mental disorders would be expected over the long time span.

A standard protocol with booster sessions was used to ensure optional extension of therapy after the regular number of sessions, according to clearly operationalized criteria. Only 31% of patients received booster sessions, addressing further anxiety and/or comorbidity (mean of 8 booster sessions). In summary, the present study demonstrates the possibility of disseminating RCT results in routine care when treatment is based on reviewed, evidence-based manuals. Over all, the current study provides strong evidence of long-term effectiveness of CBT for youth suffering from ADs.

This study shares a number of limitations with previous effectiveness studies of CBT in routine care. First, because of the lack of a control group, the clinical response is not synonymous with an effect that can be attributed to our treatment [48]. The naturalistic, uncontrolled design of the LT-FUs limits the interpretation of findings as numerous factors can confound the results due to treatment durability. We followed our sample for up to 7 years, and it would not have been ethically justifiable to use a control group over such a long period of time. As the treatment was carried out with well-tested, evidence-based manuals, it was no longer a question of proving their efficacy but of proving their long-term effectiveness. Since we know from the relevant epidemiological research that an increase in psychopathology is highly probable with increasing age in the age group studied here [49], our results are all the more remarkable. Second, we collected data only on 4 different measurement points (baseline, POST, 6MONTHS-FU and long-term FU); we could not clearly differentiate between patients who experienced a relapse and those who were continuously free from ADs until and through the FU assessments. Third, for reasons of feasibility, after initial experience, instead of the complete Kinder-DIPS [25], only the AD section was performed, so that further comorbidity could not be recorded over the course of time. However, because evaluation of SDQ in child and caregiver report shows significant differences between baseline and POST/6MONTHS-FU/long-term FU and even further improvements in long-term FU, the stability of treatment effects over time and the generalizability of the results beyond anxiety symptoms can be demonstrated. Finally, due to loss of participants, only 115 families were recruited for long-term FU (50% of enrolled patients). Recruiting only a subgroup of participants from the original sample, this selection biases the LT effects of treatment. But since long-term FUs are challenging, present results encourage further investigation into these issues.

Remarkably, our data show a clinically relevant improvement in primary and secondary outcome measures even years after treatment termination in both the child as well as the caregiver ratings. These findings are all the more remarkable because our patients are in an age group with a high risk of an increase in new-onset anxiety, and further mental disorders would be expected over the long time span studied here. It seems that better long-term effects can be achieved if the dosage of CBT is adjusted more individually than has been done so far in RCTs. In further studies, the effects of optional booster therapy sessions should be sustainably investigated. Further studies need to clarify how patients with such additional needs can be identified in time and exactly which support they will benefit from. Our long-term results show that youth in routine care can continue to benefit from CBT and achieve improvements even beyond the end of therapy. The treatment offered was rated very positively overall, both by patients and caregivers. Further research should demonstrate that evidence-based, manualized therapy is feasible and effective in routine care, even in the long run. Efforts should be made to translate the sufficient evidence of CBT from RCTs into psychotherapeutic care for children.

We thank Kristen Lavallee for language editing. We appreciate the participants and their families in this study, as well as the research assistants, for their assistance in data collection and management.

All patients and their caregivers gave their written informed consent to participate in the clinical trial and all assessments. We obtained written informed consent from parents/legal guardians for all participants aged under 18. The study was approved by the Ethical Committee of the Faculty of Psychology at the Ruhr University Bochum in Germany (reference No. 431) (ClinicalTrials.gov ID: NCT02077205).

The authors have no conflicts of interest to declare.

No funding sources were given.

K.K. and S.S. contributed to the planning and designing of the study. K.K. wrote the manuscript and reviewed it with the assistance of S.S. K.K. supervised the data collection and supervised the therapists during the “KibA trial” and the research assistants who conducted the structured interviews of the LT-FU. X.C.Z. conducted the statistical analysis.

Research data is available on demand with certain restrictions as mandated by the data protection law. Further inquiries can be directed to the corresponding author.

1.
Kessler
RC
,
Chiu
WT
,
Demler
O
,
Merikangas
KR
,
Walters
EE
.
Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the national comorbidity survey replication
.
Arch Gen Psychiatry
.
2005
;
62
(
6
):
617
27
.
2.
Beesdo
K
,
Knappe
S
,
Pine
DS
.
Anxiety and anxiety disorders in children and adolescents: developmental issues and implications for DSM-V
.
Psychiatr Clin North Am
.
2009
;
32
(
3
):
483
524
.
3.
Polanczyk
GV
,
Salum
GA
,
Sugaya
LS
,
Caye
A
,
Rohde
LA
.
Annual research review: a meta-analysis of the worldwide prevalence of mental disorders in children and adolescents
.
J Child Psychol Psychiatry
.
2015
;
56
(
3
):
345
65
.
4.
Merikangas
KR
,
He
JP
,
Burstein
M
,
Swanson
SA
,
Avenevoli
S
,
Cui
L
, et al
.
Lifetime prevalence of mental disorders in U.S. Adolescents: results from the national comorbidity survey replication--adolescent supplement (NCS-A)
.
J Am Acad Child Adolesc Psychiatry
.
2010
;
49
(
10
):
980
9
.
5.
Pine
DS
,
Helfinstein
SM
,
Bar-Haim
Y
,
Nelson
E
,
Fox
NA
.
Challenges in developing novel treatments for childhood disorders: lessons from research on anxiety
.
Neuropsychopharmacology
.
2009
;
34
(
1
):
213
28
.
6.
Brückl
TM
,
Wittchen
HU
,
Höfler
M
,
Pfister
H
,
Schneider
S
,
Lieb
R
.
Childhood separation anxiety and the risk of subsequent psychopathology: results from a community study
.
Psychother Psychosom
.
2007
;
76
(
1
):
47
56
.
7.
Lewinsohn
PM
,
Holm-Denoma
JM
,
Small
JW
,
Seeley
JR
,
Joiner
TE
Jr
.
Separation anxiety disorder in childhood as a risk factor for future mental illness
.
J Am Acad Child Adolesc Psychiatry
.
2008
;
47
(
5
):
548
55
.
8.
Kossowsky
J
,
Pfaltz
MC
,
Schneider
S
,
Taeymans
J
,
Locher
C
,
Gaab
J
.
The separation anxiety hypothesis of panic disorder revisited: a meta-analysis
.
Am J Psychiatry
.
2013
;
170
(
7
):
768
81
.
9.
Salum
GA
,
Desousa
DA
,
Rosario
MC
,
Pine
DS
,
Manfro
GG
.
Pediatric anxiety disorders: from neuroscience to evidence-based clinical practice
.
Braz J Psychiatry
.
2013
;
35
(
Suppl 1
):
S03
21
.
10.
Fineberg
NA
,
Haddad
PM
,
Carpenter
L
,
Gannon
B
,
Sharpe
R
,
Young
AH
, et al
.
The size, burden and cost of disorders of the brain in the UK
.
J Psychopharmacol
.
2013
;
27
(
9
):
761
70
.
11.
Bodden
DH
,
Dirksen
CD
,
Bögels
SM
.
Societal burden of clinically anxious youth referred for treatment: a cost-of-illness study
.
J Abnorm Child Psychol
.
2008
;
36
(
4
):
487
97
.
12.
In-Albon
T
,
Schneider
S
.
Psychotherapy of childhood anxiety disorders: a meta analysis
.
Psychother Psychosom
.
2007
;
76
(
1
):
15
24
.
13.
Reynolds
S
,
Wilson
C
,
Austin
J
,
Hooper
L
.
Effects of psychotherapy for anxiety in children and adolescents: a meta-analytic review
.
Clin Psychol Rev
.
2012
;
32
(
4
):
251
62
.
14.
James
AC
,
Reardon
T
,
Soler
A
,
James
G
,
Creswell
C
.
Cognitive behavioural therapy for anxiety disorders in children and adolescents
.
Cochrane Database Syst Rev
.
2020
;
11
(
11
):
CD013162
.
15.
Gibby
BA
,
Casline
EP
,
Ginsburg
GS
.
Long-Term outcomes of youth treated for an anxiety disorder: a critical review
.
Clin Child Fam Psychol Rev
.
2017
;
20
(
2
):
201
25
.
16.
Ginsburg
GS
,
Becker-Haimes
EM
,
Keeton
C
,
Kendall
PC
,
Iyengar
S
,
Sakolsky
D
, et al
.
Results from the child/adolescent anxiety multimodal extended long-term study (CAMELS): primary anxiety outcomes
.
J Am Acad Child Adolesc Psychiatry
.
2018
;
57
(
7
):
471
80
.
17.
Gartlehner
G
,
Hansen
RA
,
Nissman
D
,
Lohr
KN
,
Carey
TS
.
Kriterien zur Unterscheidung der Wirksamkeit von Wirksamkeitsstudien in systematischen Reviews. Technical Review 12
.
Rockville (MD)
:
Agentur für Forschung und Qualität im Gesundheitswesen
;
2006
.
18.
Thorpe
KE
,
Zwarenstein
M
,
Oxman
AD
,
Treweek
S
,
Furberg
CD
,
Altman
DG
, et al
.
A pragmatic-explanatory continuum indicator summary (PRECIS): a tool to help trial designers
.
J Clin Epidemiol
.
2009
;
62
(
5
):
464
75
.
19.
Loudon
K
,
Treweek
S
,
Sullivan
F
,
Donnan
P
,
Thorpe
KE
,
Zwarenstein
M
.
The PRECIS-2 tool: designing trials that are fit for purpose
.
BMJ
.
2015
;
350
:
h2147
.
20.
Barrington
J
,
Prior
M
,
Richardson
M
,
Allen
K
.
Effectiveness of CBT versus standard treatment for childhood anxiety disorders in a community clinic setting
.
Behav Change
.
2005
;
22
(
1
):
29
43
.
21.
Southam-Gerow
MA
,
Weisz
JR
,
Chu
BC
,
McLeod
BD
,
Gordis
EB
,
Connor Smith
JK
.
Does cognitive behavioral therapy for youth anxiety outperform usual care in community clinics? An initial effectiveness test
.
J Am Acad Child Adolesc Psychiatry
.
2010
;
49
(
10
):
1043
52
.
22.
Lau
WY
,
Chan
CK
,
Li
JC
,
Au
TK
.
Effectiveness of group cognitive behavioral treatment for childhood anxiety in community clinics
.
Behav Res Ther
.
2010
;
48
(
11
):
1067
77
.
23.
Brown
A
,
Creswell
C
,
Barker
C
,
Butler
S
,
Cooper
P
,
Hobbs
C
, et al
.
Guided parent-delivered cognitive behaviour therapy for children with anxiety disorders: outcomes at 3- to 5-year follow-up
.
Br J Clin Psychol
.
2017
;
56
(
2
):
149
59
.
24.
Kodal
A
,
Fjermestad
K
,
Bjelland
I
,
Gjestad
R
,
Öst
LG
,
Bjaastad
JF
, et al
.
Long-term effectiveness of cognitive behavioral therapy for youth with anxiety disorders
.
J Anxiety Disord
.
2018
;
53
:
58
67
.
25.
Weersing
VR
,
Iyengar
S
,
Kolko
DJ
,
Birmaher
B
,
Brent
DA
.
Effectiveness of cognitive-behavioral therapy for adolescent depression: a benchmarking investigation
.
Behav Ther
.
2006
;
37
(
1
):
36
48
.
26.
Schneider
S
,
Pflug
V
,
In-Albon
T
,
Margraf
J
.
Kinder-DIPS Open Access: diagnostisches Interview bei psychischen Störungen im Kindes- und Jugendalter. Bochum: forschungs- und Behandlungszentrum für psychische Gesundheit
.
Ruhr-Universität Bochum
;
2017
.
27.
Spence
SH
.
Structure of anxiety symptoms among children: a confirmatory factor analytic study
.
J Abnorm Psychol
.
1997
;
106
(
2
):
280
97
.
28.
Sheehan
DV
.
The anxiety disease
.
New York
:
Scribners
;
1983
.
29.
Schneider
S
,
In-Albon
T
.
Beeinträchtigungs- und Belastungsratings (BEE/BEL) Verlaufsdiagnostik und Therapieevaluation
.
Unveröffentlichtes Manuskript. Universität Basel
;
2003
.
30.
Goodman
R
.
The strengths and difficulties questionnaire: a research note
.
J Child Psychol Psychiatry
.
1997
;
38
(
5
):
581
6
.
31.
Guy
W
.
ECDEU assessment manual for psychopharmacology
.
US Dept of Health, Education, and Welfare, Public Health Service, Alcohol, drug abuse, and mental health administration
.
National Institute of Mental Health, Psychopharmacology Research Branch, Division of Extramural Research Programs
.
Rockville (MD)
;
1976
.
32.
Mattejat
F
,
Remschmidt
H
.
Evaluation von Therapien mit psychisch kranken Kindern und Jugendlichen: entwicklung und Überprüfung eines Fragebogens zur Beurteilung der Behandlung (FBB)
.
Z Klin Psychol
.
1993
;
22
(
2
):
192
233
.
33.
Schneider
S
,
Lavallee
KL
.
Separation anxiety disorder
. In:
Essau
CA
,
Ollendick
TH
, editors.
The wiley-blackwell handbook of the treatment of childhood and adolescent anxiety
.
Chichester, West Sussex, UK
:
John Wiley & Sons, Ltd.
;
2013
.
34.
Kendall
PC
.
Treating anxiety disorders in children: results of a randomized clinical trial
.
J Consult Clin Psychol
.
1994
;
62
(
1
):
100
10
.
35.
American Psychiatric Association
.
Diagnostic and statistical manual of mental disorders Fourth edition- Text revision (DSM-IV-TR)
.
Washington (DC)
:
American Psychiatric Association
;
2000
.
36.
Altman
DG
,
Schulz
KF
,
Moher
D
,
Egger
M
,
Davidoff
F
,
Elbourne
D
, et al
.
The revised CONSORT statement for reporting randomized trials: explanation and elaboration
.
Ann Intern Med
.
2001
;
134
(
8
):
663
94
.
37.
Moher
D
,
Schulz
KF
,
Altman
DG
.
The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials
.
Lancet
.
2001
;
357
(
9263
):
1191
4
.
38.
Adornetto
C
,
In-Albon
T
,
Schneider
S
.
Diagnostik im Kindes- und Jugendalter anhand strukturierter Interviews: anwendung und Durchführung des Kinder-DIPS
.
Klinische Diagnostik und Evaluation
.
2008
;
1
(
4
):
363
77
.
39.
Margraf
J
,
Cwik
JC
,
Pflug
V
,
Schneider
S
.
Strukturierte klinische Interviews zur Erfassung psychischer Störungen über die Lebensspanne: Gütekriterien und Weiterentwicklungen der DIPS-Verfahren
.
Zeitschrift für Klinische Psychologie und Psychotherapie: Forschung und Praxis
.
2017
;
46
(
3
):
176
86
.
40.
Spence
SH
.
A measure of anxiety symptoms among children
.
Behav Res Ther
.
1998
;
36
(
5
):
545
66
.
41.
Essau
CA
,
Muris
P
,
Ederer
EM
.
Reliability and validity of the Spence children’s anxiety scale and the screen for child anxiety related emotional disorders in German children
.
J Behav Ther Exp Psychiatry
.
2002
;
33
(
1
):
1
18
.
42.
Lohbeck
A
,
Schultheiß
J
,
Petermann
F
,
Petermann
U
.
Die deutsche Selbstbeurteilungsversion des SDQ-Psychometrische Eigenschaften, Faktorenstruktur und Grenzwerte
.
Diagnostica
.
2015
;
61
(
4
):
222
35
.
43.
Mattejat
F
,
Remschmidt
H
.
Fragebogen zur Beurteilung der Behandlung
. 1st ed.
Hogrefe
;
1999
.
44.
Ollendick
TH
,
King
NJ
.
Empirically supported treatments for children with phobic and anxiety disorders: current status
.
J Clin Child Psychol
.
1998
;
27
(
2
):
156
67
.
45.
von Brachel
R
,
Hirschfeld
G
,
Berner
A
,
Willutzki
U
,
Teismann
T
,
Cwik
JC
, et al
.
Long-term effectiveness of cognitive behavioral therapy in routine outpatient care: a 5- to 20-year follow-up study
.
Psychother Psychosom
.
2019
;
88
(
4
):
225
35
.
46.
Fava
GA
,
Grandi
S
,
Rafanelli
C
,
Ruini
C
,
Conti
S
,
Belluardo
P
.
Long-term outcome of social phobia treated by exposure
.
Psychol Med
.
2001
;
31
(
5
):
899
905
.
47.
Fava
GA
,
Rafanelli
C
,
Grandi
S
,
Conti
S
,
Ruini
C
,
Mangelli
L
, et al
.
Long-term outcome of panic disorder with agoraphobia treated by exposure
.
Psychol Med
.
2001
;
31
(
5
):
891
8
.
48.
Guidi
J
,
Brakemeier
EL
,
Bockting
CLH
,
Cosci
F
,
Cuijpers
P
,
Jarrett
RB
, et al
.
Methodological recommendations for trials of psychological interventions
.
Psychother Psychosom
.
2018
;
87
(
5
):
276
84
.
49.
Uhlhaas
PJ
,
Davey
CG
,
Mehta
UM
,
Shah
J
,
Torous
J
,
Allen
NB
, et al
.
Towards a youth mental health paradigm: a perspective and roadmap
.
Mol Psychiatry
.
2023
;
28
(
8
):
3171
81
.