This package is a simplistic port of the data repo created by @alexanderrobitzsch as part of their CDM R package.
To install this package, go to the package manager mode and run:
add CDMrdata
Once you have installed the package, you can use it just like any other Julia package
using CDMrdata
Currently the package has only two functions:
load_data(data_name)
: takes thedata_name
as an argument and loads it as aDict
list_datasets()
: lists all the datasets available in the package.
Suppose we want to load the ecpe
dataset:
dat = load_data("ecpe");
To know the fields for that specific dataset:
keys(dat)
KeySet for a Dict{String, Int64} with 2 entries. Keys:
"data"
"q.matrix"
And to access one of those fields:
dat["q.matrix"]
28×3 DataFrame
Row │ skill1 skill2 skill3
│ Int32 Int32 Int32
─────┼────────────────────────
1 │ 1 1 0
2 │ 0 1 0
3 │ 1 0 1
4 │ 0 0 1
5 │ 0 0 1
⋮ │ ⋮ ⋮ ⋮
24 │ 0 1 0
25 │ 1 0 0
26 │ 0 0 1
27 │ 1 0 0
28 │ 0 0 1
18 rows omitted
Dataset Name | Description (From CDM R Package Dev) |
---|---|
cdm01 |
A multiple choice dataset |
cdm02 |
Multiple choice dataset with a Q-matrix designed for polytomous attributes. |
cdm03 |
Resimulated dataset from Chiu, Koehn and Wu (2016) where the data generating model is a reduced RUM model. |
cdm04 |
Simulated dataset for the sequential DINA model (as described in Ma & de la Torre, 2016). The dataset contains 1000 persons and 12 items which measure 2 skills. |
cdm05 |
Example dataset used in Philipp, Strobl, de la Torre and Zeileis (2018). This dataset is a sub-dataset of the probability dataset in the pks package (Heller & Wickelmaier, 2013). |
cdm06 |
Resimulated example dataset from Chen and Chen (2017). |
cdm07 |
This is a resimulated dataset from the social anxiety disorder data concerning social phobia which involve 13 dichotomous questions (Fang, Liu & Ling, 2017). The simulation was based on a latent class model with five classes. The dataset was also used in Chen, Li, Liu and Ying (2017). |
cdm08 |
This is a simulated dataset involving four skills and three misconceptions for the model for simultaneously identifying skills and misconceptions (SISM; Kuo, Chen & de la Torre, 2018). The Q-matrix follows the specification in their simulation study. |
cdm09 |
This is a simulated dataset involving polytomous skills which is adapted from the empirical example (proportional reasoning data) of Chen and de la Torre (2013). |
cdm10 |
This is a simulated dataset involving a hierarchical skill structure. Skill A has four levels, skill B possesses two levels and skill C has three levels. |
dcm |
Dataset from Book 'Diagnostic Measurement' of Rupp, Templin and Henson (2010). |
dtmr |
DTMR Fraction Data (Bradshaw et al., 2014). |
ecpe |
The dataset has been used in Templin and Hoffman (2013), and Templin and Bradshaw (2014). |
fraction1 |
The dataset has been used in de la Torre, J. (2009). |
fraction2 |
The dataset has been used in de la Torre, J. (2009) & . Henson, Templin and Willse (2009) |
fraction3 |
The dataset has been used in de la Torre (2011). |
fraction4 |
The dataset has been used in de la Torre and Douglas (2004) and Chen, Liu, Xu and Ying (2015). |
fraction5 |
This dataset was used as an example for the multiple strategy DINA model in de la Torre and Douglas (2008) and Hou and de la Torre (2014). |
hr |
Simulated data according to Ravand et al. (2013). |
jang |
Simulated dataset according to the Jang (2005) L2 reading comprehension study. |
melab |
This is a simulated dataset according to the MELAB reading study (Li, 2011; Li & Suen, 2013). Li (2011) investigated the Fusion model (RUM model) for calibrating this dataset. The dataset in this package is simulated assuming the reduced RUM model (RRUM). |
mg |
Large-scale dataset with multiple groups, survey weights and 11 polytomous items. |
pgdina |
Dataset for the estimation of the polytomous GDINA model. |
pisa00R.ct |
PISA 2000 of German students including 26 items of the reading test [Chen and de la Torre (2014)]. |
pisa00R.cc |
PISA 2000 of German students including 20 items of the reading test [Chen and Chen (2016)]. |
sda6 |
This is a simulated dataset of the SDA6 study according to informations given in Jurich and Bradshaw (2014). |
Students |
This dataset contains item responses of students at a scale of cultural activities (act), mathematics self concept (sc) and mathematics joyment (mj) from an Austrian survey of 8th grade students |
timss03.G8.su |
This is a dataset with a subset of 23 Mathematics items from TIMSS 2003 items used in Su, Choi, Lee, Choi and McAninch (2013). |
timss07.G4.lee |
This dataset is a list containing dichotomous item responses (data; information on booklet and gender included), the Q-matrix (q.matrix) and descriptions of the skills (skillinfo) used in Lee et al. (2011). |
timss07.G4.py |
This dataset uses the same items as timss07.G4.lee but employs a simplified Q-matrix with 7 skills. This Q-matrix was used in Park and Lee (2014) and Park et al. (2018). |
timss07.G4.Qdomains |
This Q-matrix data is a simplification of timss07.G4.py$q.matrix to 3 domains and involves a simple structure of skills. |
timss11.G4.AUT |
TIMSS 2011 dataset of 4668 Austrian fourth-graders. |
timss11.G4.AUT.part |
Part of timss11.G4.AUT and contains only the first three booklets (with N=1010 students). |
timss11.G4.sa |
Contains the Q-matrix used in Sedat and Arican (2015). |
fraction.subtraction.data |
Tatsuoka's (1984) fraction subtraction data set is comprised of responses to 𝐽=20 fraction subtraction test items from 𝑁=536 middle school students |
fraction.subtraction.qmatrix |
The Q-Matrix corresponding to Tatsuoka (1984) fraction subtraction data set. |