Motivation Evaluation of the students' answers is a part of the teacher's work. During this process it is crucial to understand them and to be able to compare them. However, this is usually a time consuming task, since the teacher faces a hundreds of records. Even if these records are ordered by the timestamp or alphabetically, there are displayed in a form of an unstructured list, which may also cause trouble while checking them. Moreover, it is difficult to uncover patterns or unusual behaviour. In our work, we propose a method for structuring the students' answers. This process consists of two main stages. In the first stage, we detect and filter the correct answers, which is achieved with the text classification. The second step is clustering, which results in incorrect answers structuring. Method Based on our analysis of students' behaviour in online learning systems, we identified several problems we need to deal with. There, we divided our method into the following steps:
Evaluation In our experiments we used data gathered at our university using system ASQ. It is a real time presentation system which provides interactie presentations. Students can connect to them with their smart-phones or laptops. In the system we prepared set of tests for students. For classification we used dataset gathered from the course Principles of software engineering. It consisted of 28 open questions with 1680 labeled answers. Labels mark whether the answer is correct or not, their were made by teachers. Students answered questions at school in limited time using computers. They received points for inputting the correct answers. Second dataset was gathered at lectures of Principles of software engineering as well, but here the system ASQ was also utilized. Students were also motivated with extra points. For every lecture more than 100 active students were present. Except computers they used their smart-phones to connect. We had 8 presentations on course with more than 40 questions in total. We used questions from the first dataset. For the experiment we chose only those with shorter answers (3 words in average) for the experiment. Dataset consist of totally 9 questions with more than 600 answers. Classification of labeled data reached f1-micro score more than 90% using SVM. In the clustering we reached results comparable with estimations of experts. Publications
|
|||||||||||||||||||||||||||||||||||||||||||||||
|
|