Hi,
I have been given various large datasets that require a lot of time and effort cleaning incorrect codes (from a scanner that did not read the data correctly). I have completed the first dataset and feel that the route that I chose to get the corrections made were much too manual and there must be a better way to do this using SAS. Here's what I have done:
1. Ran freq distributions on all variables in dataset;
2. Highlighted with yellow pen all problem codes in every variable;
3. Ran proc print for studyid numbers based on each problem code by every variable;
4. Manually entered all sorted by studyid numbers and problem codes into new Excel document;
5. Manually placed line dividers between studyid numbers in Excel document for ease when in paper files;
6. Printed Excel document;
7. Went to paper questionnaire files to determine correct codes and noted on paper Excel document;
8. Manually entered into Excel document the correct code (new variable/new column);
9. Programmed in SAS the correction lines for each studyid number using if/then/do commands;
10. Re-ran freq distributions on all variables to ensure all problem codes were taken care of.
Step #7 is a manual, labor-intensive, time-consuming task. Just trying to get some ideas on how to automate this process in a better way. Thanks & sorry if this was placed in the wrong group. I work completely in SAS (and Excel).
I have been given various large datasets that require a lot of time and effort cleaning incorrect codes (from a scanner that did not read the data correctly). I have completed the first dataset and feel that the route that I chose to get the corrections made were much too manual and there must be a better way to do this using SAS. Here's what I have done:
1. Ran freq distributions on all variables in dataset;
2. Highlighted with yellow pen all problem codes in every variable;
3. Ran proc print for studyid numbers based on each problem code by every variable;
4. Manually entered all sorted by studyid numbers and problem codes into new Excel document;
5. Manually placed line dividers between studyid numbers in Excel document for ease when in paper files;
6. Printed Excel document;
7. Went to paper questionnaire files to determine correct codes and noted on paper Excel document;
8. Manually entered into Excel document the correct code (new variable/new column);
9. Programmed in SAS the correction lines for each studyid number using if/then/do commands;
10. Re-ran freq distributions on all variables to ensure all problem codes were taken care of.
Step #7 is a manual, labor-intensive, time-consuming task. Just trying to get some ideas on how to automate this process in a better way. Thanks & sorry if this was placed in the wrong group. I work completely in SAS (and Excel).