Reading time: 1 min predicts side effects of drugs based on patient feedback

First results of the LMU Munich collaboration with on the relevance of fatigue.

Personalized medicine has been a megatrend for many years. The central challenge: Personalization requires high-quality data that is processed specifically for this purpose. Data must be specific and for defined cohorts (age, gender, EMR).

In addition to this challenge, more data is being generated every year for global healthcare and has reached a volume that is difficult to process and store (more than 2,000 exabytes).

If drugs and therapies are to be personalized, automation is the only way to take advantage of the growing volume of data. Personalized therapies and medicines require constant analysis of incremental patient data to provide valid insights for the healthcare industry.

With this challenge in mind, and LMU Munich launched a research project in spring 2021. Together with Prof. Christian Heumann, Matthias Aßenmacher and Prof. Dr. Michael Schoenberg, an approach was developed and a first dashboard was created to show the potential of automated data processing.

The research question for this POC was: How relevant is fatigue or corresponding symptoms for colorectal cancer in a public forum on this topic?

Fatigue syndrome or chronic fatigue syndrome (CFS) is described as a long-term illness with a variety of symptoms. The most common symptom is extreme fatigue.

Fatigue often occurs as a side effect of chemotherapy and often goes undiagnosed. It decreases patients’ QALY, or Quality-Adjusted Life Years. Therefore, it is of great value to analyze the impact of fatigue on a quantitative and qualitative level.

What is’s approach to personalized medicine?

There are many posts in public patient support groups, especially on diseases like colorectal cancer. For the POC, the Insaas team decided to collect 30,000 entries from a public support group on colorectal cancer.

Together with the team of experts, they defined a list of side effects to be studied in addition to fatigue. A sample of the data was randomly selected using Insaas SUM and manually annotated by four raters. Each entry was scored three times against the defined criteria, totaling more than 9,000 data points.

The result was an ontology with different dimensions on the context of the entries, the side effects and the tonality of the texts. Prof. Dr. Schoenberg validated the results, which were used as input for machine learning training for further classification.

As a result, a dashboard with more than 695,500 ABSA data points was published at the end of September 2021. This high number is the result of long texts of mostly 1000 characters or more.

What is the added value of this analysis?

The added value of this use case is to provide physicians with data-driven insights into the effectiveness and efficiency of cancer treatments. This enables personalized treatments and better prediction of side effects.

Based on 31,163 entries, was able to show that fatigue is the most common side effect, along with pain (10,902 entries). Fatigue is particularly pronounced in the context of treatment such as Folfox (a combination of 5-FU, folinic acid, and oxaliplatin). It is striking that patients suffer from either fatigue or other symptoms such as pain. Obviously, fatigue is a serious side effect for patients, perceived as distressing as pain.

There is a wide range of ways patients talk about fatigue, such as fatigue, lack of concentration, or malaise. was able to map these descriptions into a specific ontology.

Ultimately, the findings should help physicians predict side effects and adjust medications and therapies accordingly. For physicians like Prof. Schoenberg, it is possible to filter by tonality and context to identify the effect of drugs: “With this method, it is possible to “lift a treasure trove” that is documented in patient forums and could not be systematically processed until now. Similar to a Phase IV study, various patient statements on side effects of the drugs are documented and evaluated.” This is made possible by the multidimensional annotation described above.

The data-driven insights, based on a large number of entries, should enable better prediction of side effects beyond trial-and-error in individual patients.

Share article
Denis Kargl
26. October 2021