Probing Chain-of-Thought (ProCoT): Stimulating critical thinking and writing of students through engagement with large language modelsShow others and affiliations
2025 (English)In: Journal of Pedagogical Sociology and Psychology, E-ISSN 2687-3788, Vol. 7, no 4Article in journal (Refereed) Published
Abstract [en]
We introduce a novel writing method called Probing Chain-of-Thought, which potentially prevents students from cheating using a large language model while enhancing their critical thinking. large language models have disrupted education and many other fields. For fear of students cheating, many educationists have resorted to banning their use. We conduct studies in two different courses with 65 students using qualitative research design primarily (i.e. phenomenological) and quantitative methods. The students in each course were asked to prompt a large language model of their choice with one question from a set of four (random) questions and required to affirm or refute statements in the large language model output by using peer-reviewed references as evidence. In addition, the rubric for assessing the students writing included 5 more criteria: focus, logic, content, style and correctness. The average success rate of the writing of students based on the criteria for the two cases is 79.49% (±12.82%). The results of the rubric assessment show two things: (1) Probing Chain-of-Thought stimulates critical thinking and writing of students through engagement with large language models when we compare the large language models-only output to Probing Chain-of-Thought output and (2) Probing Chain-of-Thought may prevent cheating because of clear limitations in the concerned large language models when we compare students’ Probing Chain-of-Thought output to large language models’ Probing Chain-of-Thought output. In quantitative analysis, we also discover that most students prefer to give answers in fewer words than large language models, which are typically verbose. The average word counts for students in the first course, ChatGPT 3.5, and Phind (v8) are 208, 391 and 383, respectively, while it is 405, 356, and 315 for students, ChatGPT 3.5, and BingAI, respectively, in the second course, where we enforced a minimum word-count of 300 for the students. We provide access to the outputs for possible assessments (available after review).
Place, publisher, year, edition, pages
Şahin Danişman , 2025. Vol. 7, no 4
Keywords [en]
ChatGPT, cheating, education, pedagogy, ProCoT, LLM
National Category
Didactics
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-115792DOI: 10.33902/jpsp.202536789OAI: oai:DiVA.org:ltu-115792DiVA, id: diva2:2021179
Funder
Luleå University of Technology
Note
Full text license: CC BY;
2025-12-122025-12-122025-12-12