Research studies have presented an unappreciated relationship between intimate partner violence (IPV) survivors and symptoms of traumatic brain injuries (TBI). Within these IPV survivors, resulting TBIs are not always identified during emergency room visits. This demonstrates a need for a prescreening tool that identifies IPV survivors who should receive TBI screening. We present a model that measures similarities to clinical reports for confirmed TBI cases to identify whether a patient should be screened for TBI. This is done through an ensemble of three supervised learning classifiers which work in two distinct feature spaces. Individual classifiers are trained on clinical reports and then used to create an ensemble that needs only one positive label to indicate a patient should be screened for TBI.
Intimate partner violence (IPV) can be defined as behavior that “causes physical, sexual or psychological harm, including physical aggression, sexual coercion, psychological abuse and controlling behaviors” in an intimate relationship. 1 This sort of harm is very common with one in three women and one in four men having experienced severe physical IPV at least once in their lifetime. 2 It is estimated that between 35%-90% of IPV survivors have experienced at least one head related injury. 3 These head injuries can be broadly classified as traumatic brain injury (TBI). TBI can be described as an “alteration in brain function caused by external force.” 4 Symptoms of TBI include altered mental state, loss of consciousness, and post-traumatic amnesia. 5 Loss of consciousness is regarded as an important symptom of TBI, but is not present in all brain injury cases. 6 TBI resulting from IPV is an injury that is often underreported. 7 This is due to IPV being underreported, and as a result, IPV induced brain injuries remain undetected. When presenting to the emergency department, some studies have found that 72% of domestic violence victims were not identified due to a lack of visible external injuries. 8 Similarly, IPV related TBI is estimated to be 11-12 times greater than the published incidence for other forms of TBI. 9 A vast majority of the literature on IPV is centered around women, with the data related to men being minimal. 10 Thus, the field has demonstrated a need for a solution to the underreporting of IPV, IPV related TBI, and representation of men in datasets.
The World Health Organization has validated several screening tools to identify TBIs, but none of these tools have been adapted to screen for TBI in the context of IPV. 3 Validated TBI screening tools phrase their survey questions in the context of the situation where a brain injury might have occurred. A modified version of the Brain Injury Screening Questionnaire (BISQ) has been proposed to screen for TBI in the context of IPV. Initial testing of BISQ-IPV indicates that screening in the context of IPV reveals additional brain injuries when compared to BISQ. 11 This initial testing has not been validated and so as mentioned before, it cannot be used as a validated screening approach. Similarly a modified version of the HELPS screening tool was used to estimate how often women were at risk for a brain injury. 12 This method provided a stringent criteria for identifying brain injury by asking about blows to the head, treatments received at the emergency room, loss of consciousness, and problems related to head injuries. 12 Similar to the BISQ-IPV tool it is not a validated screening tool. The Boston Assessment of TBI-Lifetime (BAT-L), a validated screening tool used to identify lifetime TBI in post 9/11 veterans, was adapted to IPV patients. 13 Its results were compared to a well-validated Ohio State University TBI Identification Method (OSU-TBI-ID) and results indicated good performance. 13 This screening tool relies on a forensic approach that requires a patient to remember the events of a brain injury. Given that a symptom of brain injuries is posttraumatic amnesia they may not remember the event, thereby impacting the screening’s results. As mentioned before, patients are not always likely to report their symptoms, and a screening tool heavily reliant on chronological order and differentiation between symptom etiology may not be entirely useful. A tool called CHATS from the Ohio Domestic Violence Network is in the process of being validated. 14
Diagnosis or identification of diseases through the use of clinical text has been done in other medical disciplines. One study identified keywords associated with an electronic health record (EHR) to discern patients at risk of HIV. 15 This study reduced a list of terms with high Term Frequency-Inverse Document Frequency (TF-IDF) scores through univariate chi-square testing to create a set of statistically significant keywords. 15 A manual selection from this set created the keyword list they used in their predictive model. 15 The key words identified include “hiv”, “homosexual”, and “tested”. 15 Using key words derived from EHR to assess HIV risk is not insightful when one of the words identified to be associated with high risk is ‘hiv’, indicating the healthcare professional has already identified the disease itself. Another study designed a custom dictionary to extract terms relevant to schizophrenia in a set of clinical notes. 16 This was done by building a matrix that indicates presence or negation of terms and then using Latent Dirichlet Allocation (LDA) to identify topics and reduce features. 16 The final selection was done based on the topic weights. 16 Although this methodology has a heavier reliance on statistical correlations, the act of manual selection reduces the accuracy of the statistics and can potentially lead it to be less relevant. Finally, another study used a manual dictionary of phrases related to cognitive decline to identify symptoms of mild cognitive impairment in order to train a prediction model. 17 The manual implementation each of these studies used relies heavily on the expertise of the individual and is contingent on each relevant word being identified, with respect to spelling errors, synonyms, and alternate phrases. Similarly, if additional EHR were to be added to this dataset, the same manual process would need to be repeated for each new record. Manual dictionaries have many downfalls in their inability to be generalized. Although they, like black box methods, can yield high results on a specific dataset, it is difficult to apply the same methodology to new cases or continually identify risk.
As stated before, methodologies reliant on manual dictionaries are difficult to generalize to new texts. The following studies have used a non-manual creation of dictionaries or do not use a dictionary at all in their identification of labels. One study used vectorized clinical notes and clustering to find distinguishing characteristics of heart disease EHR. 18 The clustering approach makes the method unsupervised, and as stated by its limitations its avoidance of diagnosis codes makes the labels descriptive rather than definitive. An alternative to dictionaries is the use of Word2Vec and a bag-of-words approach to generate ICD-9 codes related to rheumatology from EHRs. 19 EHR data is dependent on health professional investigation and so in cases where documentation is sparse, extracted ICD-9 codes may be incomplete or inaccurate. 19 Another study used large-scale support vector machine (SVM) based classifiers to extract a diagnosis status from intensive care unit clinical reports. 20 This provides some direction as to a method by which diagnosis can occur without dictionary abstraction, and is generalizable. It is, however, using only one classifier which is subject to overfitting and the volatility of the notes. 20 There is also an example in which black box modeling in the form of an artificial neural network was used to identify clinically relevant TBI cases in children through computed tomography and demographic data. 21 As is with most black boxes, high accuracies are achievable, but understanding how it is done is not. Thus, to the best of our knowledge TBI diagnosis through clinical text specifically in the case of IPV patients has not been done. However, work has been done that analyzes electronic health records to establish health effects and key associations between IPV and TBI. 22 This methodology revolved around extracting clinical terms from EHRs to establish a relationship with IPV and TBI. 22 This analysis revealed that IPV induced TBI has a relationship with other acute conditions including concussion, chronic post-traumatic headache, hematoma, and delirium. 22
Existing literature indicates free text clinical notes can be used to identify illnesses. There are existing TBI screening tools that can be used to identify TBI in the context of IPV, despite not being validated. IPV induced TBI symptoms are often difficult to identify, and so healthcare professionals do not always have all the information they need to diagnose a TBI. In addition, during an Emergency Department visit related to an IPV incident, many dimensions of the situation are being addressed, including immediate safety, the need to find shelter, and sometimes law enforcement. We aim to create a way to flag cases, to prioritize screening for TBI by employing an ensemble of multiple supervised learning classifiers trained on clinical IPV reports.
The dataset was acquired from the Emergency Department of an urban Midwest hospital and represents patients from June 2017 through June 2021. It was collected and analyzed in accordance with DePaul University and the hospital’s IRB approval. The last approved date was 11/29/2023. Figure 1 demonstrates the demographic breakdown of this dataset. The median and mean age of the patients are 34 and 36.7 years old respectively, and the distribution of these ages can be seen in Figure 1a . There are 522 Female patients and 162 Male patients, as seen in Figure 1b . The racial breakdown of these patients is 225 Latino, 224 White, 98 African American, 75 Asian, 55 Other, 5 Native American, and 2 Not Reported. Each patient is an IPV survivor and was designated as such by the hospital. This dataset also contains an Initial Clinical Reports section, that includes the first set of clinical notes for patients who have been identified as IPV survivors, and a TBI Reports section, which contains an additional set of clinical notes that indicate an IPV survivor disclosed a head injury.