PROC COMPARE Without SAS: Dataset Comparison in StatDataViewer
Dataset comparison is one of the most common tasks in clinical programming: compare the current delivery of ADSL against the previous one, verify that a dataset re-run produced identical results, or check whether a data amendment touched anything unexpected. The standard SAS tool for this is PROC COMPARE — powerful, but it requires a SAS session, a log to read, and a bit of boilerplate code every time.
StatDataViewer's Compare tool provides the same output interactively, without SAS. This post explains how to use it effectively and where it differs from PROC COMPARE.
What PROC COMPARE does — and what we're replicating
PROC COMPARE matches rows between two datasets using one or more ID variables, then reports:
- Rows in the base dataset but not the compare dataset (and vice versa).
- For matched rows: which variables have different values, and what those values are.
That is exactly what StatDataViewer's Compare tool produces — the key difference is that the results are presented as an interactive table you can sort, filter, and drill into, rather than a text log.
Setting up a comparison
Open Tools → Compare Datasets. The dialog has four main inputs:
1. Base dataset
Select from any open library. This is your reference — typically the previous version, the production dataset, or the expected output.
2. Compare dataset
Select the dataset you want to compare against the base. This is typically the new delivery, the re-run output, or the amended version.
The base and compare datasets do not need to be in the same library. Add PREV_DELIVERY and CURRENT_DELIVERY as separate libraries and select across them.
3. ID variables
Specify the variables that uniquely identify a row. For most ADaM datasets this would be STUDYID USUBJID or STUDYID USUBJID ASEQ. For SDTM: STUDYID USUBJID AESEQ for AE, or STUDYID USUBJID LBTESTCD LBDTC for LB.
Rows are matched across datasets using the ID variables. Unmatched rows appear in the "In base only" and "In compare only" sections of the results.
4. Comparison variables (optional)
Leave blank to compare all common variables. Specify a subset to limit the comparison to particular columns — useful for a targeted review of a single amended variable.
Reading the results
After clicking Compare, the results appear in three panels:
Matching rows
A summary count of rows successfully matched by ID. High match rates mean the datasets align structurally; a low rate usually means the ID variables are wrong or the dataset was restructured.
In base only / In compare only
Rows that appear in one dataset but not the other. These are shown as a separate grid you can scroll through. Common causes:
- Subjects added or dropped between deliveries.
- New AE or CM records in an amendment.
- ID variable discrepancies (e.g. trailing spaces, case differences).
Value differences
Cell-level differences for matched rows. Each row in this grid shows:
- The variable name.
- The row identifier (your ID variable values).
- The value in the base dataset.
- The value in the compare dataset.
You can sort by variable name to group all changes to a single variable, or sort by row ID to see all changes for a particular subject.
Filtering before comparing
Apply a dataset filter to either the base or compare dataset before running Compare. This is useful for a scoped review — for example, compare only the adverse events for treatment arm A, or compare only a specific visit.
Filter with D, confirm the filter in the status bar, then open Tools → Compare Datasets. The comparison will use only the filtered rows.
How it differs from PROC COMPARE
| Feature | PROC COMPARE | StatDataViewer Compare |
|---|---|---|
| Requires SAS license | Yes | No |
| Output format | Log / ODS output dataset | Interactive table (sortable, filterable) |
| Numeric tolerance (CRITERION=) | Yes | Exact match only (no tolerance setting yet) |
| Cross-library comparison | Requires LIBNAME statements | Point-and-click |
| Filter before compare | WHERE= option in dataset options | Apply dataset filter, then run Compare |
| Time to run | Write code → submit → read log | Dialog → Click → Instant results |
For most day-to-day comparison tasks in clinical programming, StatDataViewer's Compare tool is faster. PROC COMPARE remains the better choice when you need numeric tolerance, custom output datasets, or integration with a validation pipeline.
Try it now
The Compare tool is available in the free version — no license required. See the Compare datasets documentation for the complete reference, or try StatDataViewer in your browser with sample data right now.