Technical-Laymen Corpus (TLC)
The Technical-Laymen Corpus (TLC) is an annotated data set based on texts from Med1.de. Med1 is a German patient forum that provides a large variety of health related topics. Users are non-professionals who seek for exchange, opinions and advice. Med1 is freely accessible and the discussions can be read without being registered. A registration is necessary to participate in the discussion. The operating team of Med1 does not provide medical consultation, however they guide the community in terms of netiquette. The users are anonymous and only their user names are known to us.
Two subforums were used, kidney diseases and stomach and intestines. Each subforum provides a variety of user questions ("threads"), each containing a varying number of corresponding answers ("posts"). We used a webcrawler (Scrapy ) in order to collect every post of both subforums, including the time of posting, the author's nickname and the thread title. As the data does not contain any personal information, we have the permission of Med1 to share the corpus with the scientific community.
|Kidney Forum||Stomach and Intestines Forum|
|Date of Crawling||05.11.2018||10.01.2019|
|Number of crawled posts||9.516||219.404|
|Number of corpus entries||2000||2000|
The annotation involves two different concepts: (1) lay expressions and (2) technical terms. Regarding that information we mainly focus on symptoms, diseases, as well as treatments and examinations. However annotators were free to also label information that goes beyond the focus information (e.g. body parts, medication). In addition to the concept label the counterpart synonym or explanation is given as free text. The annotation has been carried out by two medical students within various iterations using the brat3 annotator tool .
'Figure 1: Text with annotated concepts.'
'Figure 2: Annotation menu.'
 Kouzis-Loukas, Dimitrios. Learning scrapy. Packt Publishing Ltd, 2016.
 Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou and Jun'ichi Tsujii. 2012. brat: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations Session at EACL 2012.
 Laura Seiffe, Oliver Marten, Michael Mikhailov, Sven Schmeier, Sebastian Möller and Roland Roller. From Witch's Shot to Music Making Bones - Resources for Medical Laymen to Technical Language and Vice Versa. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, 2020.