MOOC student drop-out

Description

This public dataset consists of actions, e.g., viewing a video, submitting an answer, etc., done by students on a MOOC online course. This dataset consists of 7,047 users interacting with 98 items (videos, answers, etc.) resulting in over 411,749 interactions. There are 4,066 drop-out events (= 0.98%)

Dataset Statistics

Users Items Interactions Node Labels Node Features Edge Labels Edge Features Action Repetition (%)
7,047 97 411,749 Exist None None Exist None


References

Srijan Kumar, Xikun Zhang, and Jure Leskovec. 2019. Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’19). Association for Computing Machinery, New York, NY, USA, 1269–1278. DOI:https://doi.org/10.1145/3292500.3330895

 @inproceedings{kumar2019predicting,
	title={Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks},
	author={Kumar, Srijan and Zhang, Xikun and Leskovec, Jure},
	booktitle={Proceedings of the 25th ACM SIGKDD international conference on Knowledge discovery and data mining},
	year={2019},
	organization={ACM}
 }

Kdd cup 2015. https://biendata.com/competition/kddcup2015/data/

Files

  1. mooc.tsv
    • Tab-separated file Header: (row_id, user, item, timestamp)
    • All user ids & item ids are normalized starting from 0.
    • Sorted by timestamp.
    • All NaN values are removed from data.
    • Example
row_id	user_id	item_id	timestamp  
0	0	0	0  
1	1	1	54
2	1	2	306
3	2	3	479
......
  1. mooc_user_mapping.tsv & mooc_item_mapping.tsv
original_id	mapped_id
0	0
1	1
4	2
7	3
......
  1. mooc_edge_features.tsv
row_id  edge_feature_0	edge_feature_1
0  0.5 1.0
1 -0.5  1.0
......
  1. mooc_node_labels.tsv
user_id	timestamp	state_label
0	0.000	0
1	36.000	0
1	77.000	1
......

Contacts

Sejoon Oh, soh337@gatech.edu, Georgia Institute of Technology
Srijan Kumar, srijan@gatech.edu, Georgia Institute of Technology