towards identifying unresolved discussions in student online forums
DESCRIPTION
Towards Identifying Unresolved Discussions in Student Online Forums. Jihie Kim, Jia Li, and Taehwan Kim Information Sciences Institute/ University of Southern California http://ai.isi.edu/discourse [email protected]. “ Talk to as many other people as possible. - PowerPoint PPT PresentationTRANSCRIPT
Towards Identifying Unresolved Discussions in Student Online Forums
Jihie Kim, Jia Li, and Taehwan Kim
Information Sciences Institute/
University of Southern California
http://ai.isi.edu/discourse
1
Pedagogical Discourse Jihie Kim/USC-ISI
2
“Talk to as many other people as possible.
CS is learned by talking to others, not by reading,
or so it seems to me now.”
-- Advice from an undergraduate computer science studenthttp://www-scf.usc.edu/~csci402/
Pedagogical Discourse Jihie Kim/USC-ISI
3
Discussion Board and Corpora
15 semesters running…
CS and Engineering courses Undergrad/Graduate USC/Non-USC Almost 800 students Over 8000 messages
15 semesters running…
CS and Engineering courses Undergrad/Graduate USC/Non-USC Almost 800 students Over 8000 messages
Extensible open-source discussion board (phpBB) serves as a platform for bridging ISI research and USC teaching practice
Pedagogical Discourse Jihie Kim/USC-ISI
4
Student Messages in an Undergraduate Operating Systems Course
Text is incoherent and
ungrammatical.
Problem description: Non-
factoid questions are difficult
to identify, dependent on
context, and may include
multiple sentences or
paragraphs.
Answers require explanations.
Pedagogical Discourse Jihie Kim/USC-ISI
5
Thread Length Distribution
Data from an undergraduate CS Course
0
100
200
300
400
500
600
# of threads
1 3 5 7 9 11 13 15 18 20 31 # of posts
Statistics of thread length
Data from a graduate CS Course
0
2
4
6
8
10
12
14
16
18
1 2 3 4 5 6 8 9 10 12 16
# of threads
# of messages
Threads are often very short, many consisting of only 1-2 messages
Students jump into programming details without understanding larger picture or related concepts
TA and instructors are not always available to fully guide interactions
# of messages
Need of Discussion Assessment and Scaffolding
Pedagogical Discourse Jihie Kim/USC-ISI
6
PedDiscourse Research
Discussion Assessment Which discussions need instructor attention?
Who is asking and answering questions?
What topics are discussed when?
Discussion Scaffolding Promote reflection
Promote collaboration among students
Pedagogical Discourse Jihie Kim/USC-ISI
7
Individual messages Topic, quantity
Relations among messages Response/Replies Roles that a message play
Discussion threads Thread lengths and quantity Discussion Topic Discussion Focus
…
Related course data Notes, web pages, readings Assignments and projects
. . .
Modeling discussion threads
Pedagogical Discourse Jihie Kim/USC-ISI
8
Discussion Assessment
Which discussions need instructor attention? Identify roles that individual messages play (ques,
ans, ack, etc.) Analyze patterns of message roles Find discussion threads without an answer for the
initial question
Pedagogical Discourse Jihie Kim/USC-ISI
9
Roles of individual messages
Use Searle’s theory of Speech Acts (Searle, 1969) to model threaded discussions
Speech Acts • Choose SAs to use
Question (QUES), Answer or Suggestion (ANS-SUG), Correction or Objection (Neg-Ack), …..
• Provide relationship between a pair of messages• Multiple SA’s per pair of messages in thread• A single message can be related (via SAs) with
multiple messages
Pedagogical Discourse Jihie Kim/USC-ISI
10
Speech Acts (SAs) in a discussion thread
S1
S2
S1
I am still confused. I understand it is in the same address space as the parent process, where do we allocate the 8 pages of mem for it? And how do we keep track of .....? … I am sure it is a simple concept that I am just missing.
I am still confused. I understand it is in the same address space as the parent process, where do we allocate the 8 pages of mem for it? And how do we keep track of .....? … I am sure it is a simple concept that I am just missing.
S3
read the student documentation for the Fork syscall read the student documentation for the Fork syscall
The Professor gave us 2 methods for forking threads from the main program. One was ....... The other was to ......... When you fork a thread where does it get created and take its 8 pages from? Do you have to calculate ......? If so how? Where does it store its PCReg .......? Any suggestions would be helpfule.
The Professor gave us 2 methods for forking threads from the main program. One was ....... The other was to ......... When you fork a thread where does it get created and take its 8 pages from? Do you have to calculate ......? If so how? Where does it store its PCReg .......? Any suggestions would be helpfule.
If you use the first implementation...., then you'll have a hard limit on the number of threads....If you use the second implementation, you need to....
Either way, you'll need to implement the AddrSpace::NewStack() function and make sure that there is memory available.
If you use the first implementation...., then you'll have a hard limit on the number of threads....If you use the second implementation, you need to....
Either way, you'll need to implement the AddrSpace::NewStack() function and make sure that there is memory available.
QUES
ISSUE, QUES
ANS-SUG
ANS-SUG
Pedagogical Discourse Jihie Kim/USC-ISI
11
Code 1Name
QUES Question
ANNO Announcement
CANS Complex Answer
SANS Simple Answer
SUG Suggest
ELAB Elaborate
CORR Correct
OBJ Object
CRT Criticize
SUP Support
ACK Acknowledge
COMP Complement
Code 3
QUES
ANNO
ANS-SUG
ELAB
POS-ACK
NEG-ACK
Code 2
POS
NEUT
NEG
Code 1
Code 2Code 3
Kappa: 0.70
Kappa: 0.54
Kappa: 0.58
Speech Act categories explored
Pedagogical Discourse Jihie Kim/USC-ISI
12
Current Speech Act Categories
SA Category
Description kappaDistribution (%
in corpus)
QUESA question about a problem, including
question about a previous message0.94 50.6
ANS-SUGA simple or complex answer to a previous
question. Suggestion or advice0.72 41.2
ISSUEReport misunderstanding, unclear concepts
or issues in solving problems0.88 15.4
Pos-ACKAn acknowledgement, compliment or support in response to a prev. message
0.87 9.1
Neg-ACKA correction or objection (or complaint)
to/on a previous message0.85 2.6
Pedagogical Discourse Jihie Kim/USC-ISI
13
Data cleaning and pre-processing
Discussion data • Noisy, Incoherent• High variations – messages may contain answers or suggestions in the
form of questions• Informal dialect used by students
Data pre-processing – Tokenization, Stemming, other filtering steps applied
• (e.g. Removing programming code existing within messages, pluralized words,…etc….)
Data Categorization• Transform/Replace commonly occurring words/word-sequences with
categories Apostrophe words ( ‘re, ‘ve, ‘m…) Technical terms existing within messages replaced by TECH_TERM -
(from commonly used technical terms in course) Don’t replace pronouns (“you can” in ANS vs. “I can”)
Pedagogical Discourse Jihie Kim/USC-ISI
14
Features for SA Classification
F1: Cue phases and their positions (e.g. “Thank” position)
F2: Message Position F3: Previous Message
Information F4: Poster Class F5: Poster Change F6: Message Length
IF cue-phrase = {What} &{“?”} => QUES
IF cue-phrase = {“yes you can”}& poster-info = Instructor
& post-length = Medium => ANS
IF cue-phrase = {“yes”}& cue-position = CP_BEGIN
& prev-SA = QUES=> ANS
IF cue-phrase = {“not know”} & poster-info = student
& poster-change = YES => ISSUE
Example TBL rules
Pedagogical Discourse Jihie Kim/USC-ISI
15
SA Classification Results
SA Category
Support Vector Machine (SVM)
Transformation-Based Learning (TBL)
Precision Recall F score Precision Recall F score
QUES 0.95 0.90 0.94 0.96 0.91 0.95
ANS 0.87 0.80 0.85 0.83 0.64 0.78
ISSUE 0.65 0.54 0.62 0.46 0.76 0.50
Pos-ACK 0.57 0.44 0.54 0.58 0.56 0.57
Neg-ACK 0 0 0 0.5 0.38 0.47
Pedagogical Discourse Jihie Kim/USC-ISI
16
Profiling discussion threads with SAs
(Q1) Were all questions answered? (Y/N)(Q2) Were there any issues or confusion? (Y/N)(Q3) Were those issues or confusions resolved? (Y/N)
Pedagogical Discourse Jihie Kim/USC-ISI
17
Thread classification with SA classifiers
Feature Set1: Whether there was an [SA] in the thread Feature Set2: Whether the last message in the thread included [SA]
Precision Recall F score
Q1 0.93 0.93 0.93
Q2 0.93 0.93 0.93
Q3 0.89 0.89 0.89
(a) SVM Classification results with human annotated SAs
Precision Recall F score
Q1 0.83 0.84 0.83
Q2 0.77 0.74 0.76
Q3 0.68 0.69 0.68
(b) SVM Classification results with system generated SAs
(Q1) Were all questions answered? (Y/N)(Q2) Were there any issues or confusion? (Y/N)(Q3) Were those issues or confusions resolved? (Y/N)
Pedagogical Discourse Jihie Kim/USC-ISI
18
Direct thread classification without SA classifiers
F1’: cue phrases and their positions (last message or not) in the thread
Precision Recall F score
Q1 0.86 0.86 0.86
Q2 0.81 0.62 0.70
Q3 0.75 0.33 0.46
(a) With SAs
Precision Recall F score
Q1 0.83 0.84 0.83
Q2 0.77 0.74 0.76
Q3 0.68 0.69 0.68
(Q1) Were all questions answered? (Y/N)(Q2) Were there any issues or confusion? (Y/N)(Q3) Were those issues or confusions resolved? (Y/N)
(b) Direct classification
Pedagogical Discourse Jihie Kim/USC-ISI
19
Summary and Discussion
Identifying unresolved discussions Discerning speech acts (SAs) in student online discussions
Classify discussion threads with SA as features
Compare SA-based classification and direct thread classification with phrase features
SA-based features may help some difficult cases
• E.g. Longer threads with more than one questions raised
Pedagogical Discourse Jihie Kim/USC-ISI
20
Related Work
Pedagogical/tutorial dialogueInstructional discourse modeling (Yuan et al., 2008; Graesser et al., 2005;
McLaren et al., 2007; Boyer et al., 2008; Fossati 2008; Litman et al., 2003)
Dialogue modeling in email messages or blog (e.g. AAAI 2008 workshop on Enhanced Messaging)
• Email speech acts• Requests and commitments
Handling noisy data and high variance in text (Knoblock et al., 2007)
Course topic and task modeling using information extraction techniques(Roy et al. 2008; Jovanovic et al., 2006 )
Trace student e-learning activities (Israel and Aiken, 2007; Dringus and Ellis, 2005)
Pedagogical Discourse Jihie Kim/USC-ISI
21
Ongoing Work: Discussion Assessment
Discussion thread pattern and phase analysis question, understanding, solving and closing
Discussion topic analysis Coherency of discussion topics
Student profiling Information providers (peer mentors) vs. information seekers Information flow and influence network among participants
Use of workflows (distributed systems) for large-scale assessment E.g. participation changes over several semesters
Pedagogical Discourse Jihie Kim/USC-ISI
22
Supported by National Science Foundation (NSF)
More details available at
http://ai.isi.edu/discourse
Email: [email protected]