UC_vs_US Statistic Analysis.xlsx
F. (Fabiano) Dalpiaz
10.23644/uu.12631628.v1
https://uu.figshare.com/articles/dataset/UC_vs_US_Statistic_Analysis_xlsx/12631628
Sheet 1 (Raw-Data): The raw data of the study is provided, presenting the tagging
results for the used measures described in the paper. For each subject, it
includes multiple columns:<br>
A. a sequential student ID<br>
B an ID that defines a random group label and the notation<br>
C. the used notation: user Story or use Cases<br>
D. the case they were assigned to: IFA, Sim, or Hos<br>
E. the subject's exam grade (total points out of 100). Empty cells mean
that the subject did not take the first exam<br>
F. a categorical representation of the grade L/M/H, where H is greater or
equal to 80, M is between 65 included and 80 excluded, L otherwise<br>
G. the total number of classes in the student's conceptual model<br>
H. the total number of relationships in the student's conceptual
model<br>
I. the total number of classes in the expert's conceptual model<br>
J. the total number of relationships in the expert's conceptual model<br>
K-O. the total number of encountered situations of alignment, wrong
representation, system-oriented, omitted, missing (see tagging scheme
below)<br>
P. the researchers' judgement on how well the derivation process
explanation was explained by the student: well explained (a systematic
mapping that can be easily reproduced), partially explained (vague indication
of the mapping ), or not present. <br>
<br>
Tagging scheme:<br>
Aligned (AL) - A concept is represented as a class in both models, either
with the same name or using synonyms or clearly linkable names;<br>
Wrongly represented (WR) - A class in the domain expert model is
incorrectly represented in the student model, either (i) via an attribute,
method, or relationship rather than class, or
(ii) using a generic term (e.g., ``user'' instead of ``urban
planner''); <br>
System-oriented (SO) - A class in CM-Stud that denotes a technical
implementation aspect, e.g., access control. Classes that represent legacy
system or the system under design (portal, simulator) are legitimate; <br>
Omitted (OM) - A class in CM-Expert that does not appear in any way in
CM-Stud; <br>
Missing (MI) - A class in CM-Stud that does not appear in any way in
CM-Expert. <br>
<br>
All the calculations and information provided in the following sheets
originate from that raw data.<br>
<br>
Sheet 2 (Descriptive-Stats): Shows a summary of statistics from the data collection,
including the number of subjects per case, per notation, per process
derivation rigor category, and per exam grade category.<br>
<br>
Sheet 3 (Size-Ratio):
The number of classes within the student model divided by the number of
classes within the expert model is calculated (describing the size ratio). We
provide box plots to allow a visual comparison of the shape of the
distribution, its central value, and its variability for each group (by case,
notation, process, and exam grade) . The primary focus in this study is on
the number of classes. However, we also provided the size ratio for the
number of relationships between student and expert model.<br>
<br>
Sheet 4 (Overall):
Provides an overview of all subjects regarding the encountered situations,
completeness, and correctness, respectively. Correctness is defined as the
ratio of classes in a student model that is fully aligned with the classes in
the corresponding expert model. It is calculated by dividing the number of
aligned concepts (AL) by the sum of the number of aligned concepts (AL),
omitted concepts (OM), system-oriented concepts (SO), and wrong
representations (WR). Completeness on the other hand, is defined as the ratio
of classes in a student model that are correctly or incorrectly represented
over the number of classes in the expert model. Completeness is calculated by
dividing the sum of aligned concepts (AL) and wrong representations (WR) by
the sum of the number of aligned concepts (AL), wrong representations (WR)
and omitted concepts (OM). The overview is complemented with general
diverging stacked bar charts that illustrate correctness and
completeness.<br>
<br>
For sheet 4 as well as for the following four sheets, diverging stacked bar
charts are provided to visualize the effect of each of the independent and
mediated variables. The charts are based on the relative numbers of
encountered situations for each student. In addition, a "Buffer" is
calculated witch solely serves the purpose of constructing the diverging
stacked bar charts in Excel. Finally, at the bottom of each sheet, the
significance (T-test) and effect size (Hedges' g) for both completeness and
correctness are provided. Hedges' g was calculated with an online tool:
https://www.psychometrica.de/effect_size.html. The independent and moderating
variables can be found as follows:<br>
<br>
Sheet 5 (By-Notation):
Model correctness and model completeness is compared by notation - UC,
US.<br>
<br>
Sheet 6 (By-Case):
Model correctness and model completeness is compared by case - SIM, HOS,
IFA.<br>
<br>
Sheet 7 (By-Process):
Model correctness and model completeness is compared by how well the
derivation process is explained - well explained, partially explained, not
present. <br>
<br>
Sheet 8 (By-Grade):
Model correctness and model completeness is compared by the exam grades,
converted to categorical values High, Low , and Medium.
2020-07-09 15:13:42
statistical analysis
dataset
Conceptual Modelling
Software Engineering