In my personal opinion, software validators should read the key varaibles from the define.xml file and deciide on whether a record is duplicate or not based on the key variables from the define.xml.
Essentially, define.xml contains the metadata of submission files and thus is leading.
Hi Sergey, XML4Pharma, I understand that a SDTM validation check is discussed here. Just want to note that in case a similar approach will be applied to ADaM, in ADaM there is no strict requirement that key variables uniquely identify a record in a dataset:
ADaM 2.1 page 14:
KEY VARIABLES OF DATASET
A list of variable names that parallels the structure, ideally uniquely identifies and indexes each record in the dataset.
I have a question while creating and vallidating SDTM TU and TR domain.
For example, I have screening tumor data for three target lesions at liver site SEGMENT 3, SEGMENT 4 and SEGMENT 7. Also, the corresponding three Longest Diameters are recorded.
Thereafore, I should create three rows on TU and TR domains for liver site SEGMENT 3, SEGMENT 4 and SEGMENT 7.
For TR domain, the TRTEST=‘Longest Diameter’/TRTESTCD=‘LDIAM’ for TRLNKID ‘T1’, ‘T2’, ‘T3’;
For TU domain, the TUTEST=‘Tumor Identification’/TUTESTCD=‘TUMIDENT’ for TULNKID ‘T1’, ‘T2’, ‘T3’
(TULOC=‘Liver’, and TUPORTOT=‘SEGMENT 3’, ‘SEGMENT 4’ and ‘SEGMENT 7’);
But the validation message indicated that ‘Duplicate records’ since it should be one records per Finding Result per subject. No Finding Result with the same Test Short Name (–TESTCD) for the same Subject (USUBJID) and the same Collection Date (–DTC) are expected.
However, I think the presentation is correct, and the --LNKID is provided for distinguishing.
May I ask how to resolve this issue? Any feedback or any references on this topic would be appreciated.