Basicly, the way we did this: 1. split the whole dataset of 493 students into two parts: approxmately 50%/50%, training part for training a regression model, the other part for testing. Training data is organized as one row per student; Testing data is organized as one row per student per month, from Sep. 16th to May 16th 2. Train up a regression model, including pretest, online metrics, interaction and quadratic terms as paramters 3. Apply the regression equation on testing data, and calcuated a new variable: Assistance_predicted_score 4. Longitudinally track the change of percent correct on original question and Assistance_predicted_score Result: Sum of squares: 31876.1773 vs 13256.7697 BIC: 9656 vs. 9090 under predicting 5. Compute the predicted score of the last month for each student for both models Do simple linear regression to predict MCAS score using the predicted score from each model seperately. Use the regression-predicted score to calculate MAD Result: MAD: 6.6 vs. 5.76 6. To see how good our prediction is: pick 34 Mulitple-choice and short answer questions from MCAS 2005. Use half of them (17 items to predict the other half. This was done 5 times with randomly picked items in both halves. Average MAD: 1.892382979 (remember the full score is 17 here, which the full score in our prediction is 54) --------------------------------------------------------- Notes for myself of the whole process: original file: 662 students; with pretest: 493 students; 494 students included in files in this folder. Training: 250, Testing: 243 start from student_all.sav filter out students for whom we don't have pretest score --> student_with_pretest_regression.sav randomly select approxmately 50% students from above file and save as full.xls in excel, split full.txt into two files: training.txt (250 rows) and testing.txt (244) rows open training.txt in spss and save it as Training.sav. run stepwise regression to predict MCAS score. (posttest score is not used) Model reached: (Constant) 27.811 INTER_PRE_PERCENT_CORRECT 0.593 AJ INTER_AVG_HINT_REQ_AVG_ATTEMPT -4.122 BM INTER_PRE_AVG_HINT_REQ 0.346 AK QUA_ORIGINAL_CORRECT 28.727 AZ AVG_ATTEMPT -5.302 AB Result: TRAINING_REGRESSION_RESULT.HTM create table assistment_reporting.tmp_user; and import from full.xls user_id/$filter into this table, so that we can select data from testing users. (filter = 1 for testing user) create view (find it also in ../combine-longi-online/db_view.sql): create or replace view assistment_reporting.vStudentProgressTesting as select a.* from assistment_reporting.vStudentProgress a, assistment_reporting.tmp_user b where a.studentID = b.user_id and b.flag = 1; dump data from view, result stored in file: student_progress_test_set.txt.output.txt, 243 students included. open the test_set in SPSS, run ../predictMCASScore-Assessmeng/compute_column.SPS to compute quadratic and interaction terms and save it as Testing_longitudinal.sav and testing_longitudinal.xls. In excel, apply the coefficient to compute a new column called "Assistance_predicted_score" = if (27.811+0.593*AJ2-4.122*BM2+0.346*AK2+28.727*AZ2-5.302*AB2 >0, 27.811+0.593*AJ2-4.122*BM2+0.346*AK2+28.727*AZ2-5.302*AB2, 0) If the result is lower than zero, use zero. Save the finished file as Testing_longitudinal.txt (Just realized that this can be done directly in SPSS, no need to save as xls. Compute Assistance_predicted_score = 27.811+0.593*INTER_PRE_PERCENT_CORRECT-4.122*INTER_AVG_HINT_REQ_AVG_ATTEMPT+0.346*INTER_PRE_AVG_HINT_REQ+28.727*QUA_ORIGINAL_CORRECT-5.302*AVG_ATTEMPT Rum mixed linear model: 1. to predict %correct(called predictedscore in file) on original question. MIXED PredictedScore BY schoolID TeacherID WITH CenteredMonth /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = CenteredMonth schoolID CenteredMonth*schoolID | SSTYPE(3) /METHOD = ML /PRINT = SOLUTION TESTCOV /RANDOM INTERCEPT CenteredMonth | SUBJECT(studentID) COVTYPE(UN) /SAVE = PRED . Result: MIXED_PREDICTEDSCORE.HTM, BIC = 9656 [ MIXED PredictedScore BY schoolID specialed WITH CenteredMonth /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = CenteredMonth schoolID specialed CenteredMonth*schoolID CenteredMonth*specialed | SSTYPE(3) /METHOD = ML /PRINT = SOLUTION TESTCOV /RANDOM INTERCEPT CenteredMonth | SUBJECT(studentID) COVTYPE(UN) . BIC = 9648, all significant MIXED PredictedScore BY schoolID specialed freelunch WITH CenteredMonth /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = CenteredMonth schoolID specialed freelunch CenteredMonth*schoolID CenteredMonth*specialed | SSTYPE(3) /METHOD = ML /PRINT = SOLUTION TESTCOV /RANDOM INTERCEPT CenteredMonth | SUBJECT(studentID) COVTYPE(UN) . BIC = 9653, all significant ] 2. to predicte Assistance_predicted_score. MIXED Assistance_predicted_score BY schoolID WITH CenteredMonth /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = CenteredMonth schoolID | SSTYPE(3) /METHOD = ML /PRINT = SOLUTION TESTCOV /RANDOM INTERCEPT CenteredMonth | SUBJECT(studentID) COVTYPE(UN) /SAVE = PRED . Result: MIXED_ASSISTANCE_PREDICTEDSCORE.HTM, BIC = 9090 Note: convergence not reached if use schoolID as predictor of learning rate [ MIXED Assistance_predicted_score BY schoolID specialed WITH CenteredMonth /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = CenteredMonth schoolID specialed | SSTYPE(3) /METHOD = ML /PRINT = SOLUTION TESTCOV /RANDOM INTERCEPT CenteredMonth | SUBJECT(studentID) COVTYPE(UN) . BIC = 9080 MIXED Assistance_predicted_score BY schoolID specialed freelunch WITH CenteredMonth /CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE) /FIXED = CenteredMonth schoolID specialed freelunch | SSTYPE(3) /METHOD = ML /PRINT = SOLUTION TESTCOV /RANDOM INTERCEPT CenteredMonth | SUBJECT(studentID) COVTYPE(UN) . BIC = 9082, all significant ] Predicted value for all months for both model saved in specific as Track_predicted_score and Track_Assistance_predicted_score Save the .sav file back as xls file: Testing_longitudinal_with_predicted_value.xls; student ID and predicted scores by month are extracted to TXT file called Testing_predicted_scores(by month).TXT. compute intercept and slope given the predicted score. first, import predicted scores in file Testing_predicted_scores(by month).TXT into database: tmp_LTA create table tmp_LTA ( user_id number, centered_month number, original_predicted_score number, original_intercept number, original_slope number, assistance_predicted_score number, assistance_intercept number, assistance_slope number); -- compute slope update tmp_LTA d set original_slope = (select (a.original_predicted_score - b.original_predicted_score)/(a.centered_month - b.centered_month) from tmp_LTA a, tmp_LTA b, (select user_id, max(centered_month) max_month, min(centered_month) min_month from tmp_LTA group by user_id) c where a.user_id = c.user_id and b.user_id = c.user_id and a.centered_month = c.max_month and b.centered_month = c.min_month and a.centered_month != b.centered_month and a.user_id = d.user_id); update tmp_LTA d set assistance_slope = (select (a.assistance_predicted_score - b.assistance_predicted_score)/(a.centered_month - b.centered_month) from tmp_LTA a, tmp_LTA b, (select user_id, max(centered_month) max_month, min(centered_month) min_month from tmp_LTA group by user_id) c where a.user_id = c.user_id and b.user_id = c.user_id and a.centered_month = c.max_month and b.centered_month = c.min_month and a.centered_month != b.centered_month and a.user_id = d.user_id); -- compute intercept update tmp_LTA set original_intercept = original_predicted_score - centered_month*original_slope; update tmp_LTA set assistance_intercept = assistance_predicted_score - centered_month*assistance_slope; Compare the two predictions: 1) use last month's score, and compute SS for both of them -- single student grouped select distinct user_id, (original_final - mcasscore)*(original_final- mcasscore), (ASSISTANCE_final - mcasscore) * (ASSISTANCE_final - mcasscore) from (select user_id, avg(original_intercept) +7*avg(original_slope) original_final, avg(ASSISTANCE_intercept) + 7*avg(ASSISTANCE_slope) ASSISTANCE_final from tmp_LTA group by user_id) LTA, vdataframe4_05 where user_id = studentID and original_final is not null; (241 rows selected) -- all together SS select sum((original_final - mcasscore)*(original_final- mcasscore)), sum((ASSISTANCE_final - mcasscore) * (ASSISTANCE_final - mcasscore)) from (select user_id, avg(original_intercept) +7*avg(original_slope) original_final, avg(ASSISTANCE_intercept) + 7*avg(ASSISTANCE_slope) ASSISTANCE_final from tmp_LTA group by user_id) LTA, vdataframe4_05 where user_id = studentID and original_final is not null; SUM((ORIGINAL_FINAL-MCASSCORE)*(ORIGINAL_FINAL-MCASSCORE)) SUM((ASSISTANCE_FINAL-MCASSCORE)*(ASSISTANCE_FINAL-MCASSCORE)) ---------------------------------------------------------- -------------------------------------------------------------- 31876.1773 13256.7697 2) do regression get final predicted scores and MCAS score for regression predict MCAS score using original_final and assistance_final, and calculate MAD file saved from SPSS as: Testing_regression.sav avg(original_MAD) = 6.6, avg(assistance_MAD)=5.76 paired T-test result: Testing_T-test_MAD.htm compare to prediction based on MCAS items themselves: avg: 1.892382979 705 students; 34 Items used (39 excludes 4 open response items to make the two part balanced and each item got full score of 1, just like in the Assistment system) randomly select 17 items out of 34, and apply linear regressio to use students' performance on the selected 17 items to predict their performance on the rest items. This was run 5 times and the average value is 1.892382979.