Artificial intelligence (AI) allows time-consuming manual work to be taken care of through automation, such as summarizing texts. But this summarization of texts is only useful if it provides a very high validity of the results. This implies that AI is able to correctly recognize the context.
We basically distinguish between “unsupervised” and “supervised learning”. In the first case, AI recognizes possible patterns without being influenced; in the second case, it is trained with correct results. In the case of automated classification and categorization of text (supervised learning) the creation of inidivdual training sets is the cornerstone of success.
Why? The biggest challenge (compared to image recognition) is not only the different languages and dialects, but also the different contexts (varying industries with different products) and text forms (e.g., blogs, emails, tickets). Supervised learning provides the best results for the classification of diverse, colloquial content.
For this reason, training contexts in a given language is crucial to achieve good results. But exactly this step is very costly, because texts have to be trained in high numbers. Only when the machine is able to recognize the context correctly certain phrases and sentiment can be identified. The training sets or the template with correct and incorrect results are key to the quality of results and very valuable if they are of high quality. For example, it is then possible to identify how many customers complain about the quality of service by correctly assigning and aggregating all the associated phrases and words.
AI is therefore not intelligent until it can take correct decisions independently and at scale.
With Insaas Sum, training data for artificial intelligence, the so-called annotations, can be generated easily and specifically. Insaas Sum is multi-tenant and allows assigning specific sets of texts to individual human trainers (also called raters). Raters can order the texts not only by sentiment, but also by any other dimension. This makes it possible to quickly generate individual training sets of high quality.
Afterwards, the results can be easily evaluated to see the agreement of the raters in terms of texts and keywords. This minimizes possible misinterpretations of the AI.
What is the added value of Insaas Sum?
Training the AI can waste a lot of time. Cloud-based tools or crowd sourcing are often used for training the text files, and in the worst case even Excel. All these variants have the following disadvantages:
- Open source tools are most of the time not flexible enough
- There is no possibility to annotate the same phrase multiple times on one screen
- Crowd sourcing bears the risk that raters show a different performance and significantly more data needs to be trained
- There is usually no possibility to influence the order of annotation
- Automated evaluation and visualization, e.g. inter-rater correlation and deviation, is usually not available.
The added value of Insaas SUM consists in a clear workflow for the editor, who sets up the annotations as an admin for the raters. The setup of the projects can be customized depending on how many raters are needed. In the first step, the texts and the respective keywords and phrases are suggested automatically. The raters already have a template for the annotation, each consisting of text and keywords.
Like this, not only dimensions like positive, negative, and neutral are trained. The dimensions that capture the context of the texts can be defined freely. Also relevant phrases can be trained easily. It is very easy to mark text passages and add additional keywords. Raters have the freedom to train texts in terms of relevance and context.
At last, the editor can evaluate the result in a dashboard and only use the results with high agreement (Fleiss Kappa > 0.5, Substantial Agreement) for further training.
How companies work with Insaas Sum
Annotations can be very costly considering that thousands of data points need to be trained. Therefore, it is crucial that this process is smooth and transparent.
Projects are set up in the portal of Insaas Sum. There, texts are randomly provided including the keywords. As described earlier, texts are automatically (and randomly) assigned to the individual trainers. The necessary dimensions are defined and determined. The editor has full control over the results and can see how well the individual raters have done their job. He also sees if the training set is sufficiently balanced and enough data points have been collected for all relevant dimensions.
The external trainers can simply log-in and get access to a browser-based interface. There, all tasks are already created for the respective trainer who only has to perform the individual assessments. The results of the work are automatically saved.
With Insaas Sum, data science teams and machine learning engineering teams can create training sets in high quality and use them to train their Machine Learning algorithms. They can measure the quality and ensure the best results. Like this, Insaas Sum provides the cornerstone for high quality results in text classification.
Get your demo for Insaas Vector!
We are looking forward to provide you with a demo of Insaas Vector!