@Jozef,
I agree other forums may be appropriate, but the co-oping of this discussion to other issues began and continues with your posts, posts that cloud the issues of today with arguments about tomorrow.
Regarding:
b) I cannot agree on the second. I understand “original” as “as collected originally”. Anything else does not make sense to me. Otherwise I propose that we rename --ORRESU in --MORESU (mapped from original result unit
). The comparison with “1=male” is not applicable, as the variable (DM.SEX) does not claim to be “original”.
While it seems reasonable to interpret “original” this way, it is DIRECTLY contradicted by the IG. Beyond the explicit language around LBORRESU, see SDTM IG-3.2, Section 4.1.5.1.1 Original and Standardized Results (bolding my own):
When the original measurement or finding is a selection from a defined codelist, in general, the –ORRES and --STRESC variables contain results in decoded format, that is, the textual interpretation of whichever code was selected from the codelist. In some cases where the code values in the codelist are statistically meaningful standardized values or scores, which are defined by sponsors or by valid methodologies such as SF36 questionnaires, the --ORRES variables will contain the decoded format, whereas, the --STRESC variables as well as the --STRESN variables will contain the standardized values or scores.
You may feel that the above was the incorrect approach, or you mare argue for a different interpretation, but the bottom line is that CDISC materials must be the basis for defining the standards, period.
The standard identifies a codelist associated with the original unit variable, and so you are required to process your data as subject to a codelist. I fundamentally disagree with the idea that this is not established in the standard. Perhaps it could be more clear, or better justified, but these are separate considerations.
That is not to say there are no gray areas and challenges. I completely agree that often the standards are ambiguous or inconsistent, and reasonable people can disagree or a best approach in a bad situation, but that is not the case with some of the issues you raise, this one specifically in my opinion.
Regarding:
e) The example about blood pressure is real. If it was collected in cm[Hg] what should I do? Wait until the CT team accepts it as a new term? And if it was rejected (as is happening)? 6 months lost and no solution? Or extend the codelist and still get a validation error or warning in the validation software used by the FDA (panic in the RA department). I do check submissions of customers and what I usually see is that they then do a conversion.
I would love to see some clear and exact statements from the SDTM team how to deal with such situations.
I disagree that this is all that controversial, and am unsure to what extent the industry as a whole find this challenging. To my mind, you should:
- Identify which version of CDISC terminology is to be used. More (or most) recent is encouraged, but is not required, and there may be business/operational needs that drive the use of an earlier version. There is no prohibition either way.
- Ensure the term (unit in this case) is not a synonym of an existing term.
- Assuming it is not, represent the term in the data (original unit variable). There is an open question as to whether it is best to leave it as is “cm[Hg]”, or align it to the published CDISC unit term and symbol conventions, so “cmHg”, but ultimately, there are no strict conformance considerations in that question, and it is often decided by operational concerns.
- While encouraged, it is at a company’s discretion as to whether to submit to CDISC terminology team, and doesn’t truly impact the process here. Even if CDISC were to add the term to a future version, you may not decide to upversion, in which case the term is still an extended one.
- Whether the codelist is extensible or non-extensible is important, but as you often say, it is also important to represent the truth of your data. If faced with having collected data in a way that violates a non-extensible codelist, it is often best to explain that violation rather than modify the collection concept of the data to adhere to the constraint. In short, it is already too late, and you need to take responsibility for the way in which the data were collected.
- Document any validation findings as appropriate. It should be understood that messages along the lines of “you have extended an extensible codelist” are largely informational, and should be documented that way in the SDRG. For example, “The UNIT codelist was extended to accommodate collected terms not represented in the established terms.”
- If the RA department panics over the presence of check violations, provide them a variety of industry references to show that not all violations are the same, and that it is impossible under the current ruleset to have a completely “clean” report. That said, it is their call as the sponsor to determine how to proceed.
There are edge conditions, of course, but at a certain point, this needs to be about generally robust processes, and edge conditions can be resolved on a case by case basis.
Regards,
Carlo