The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Clinical ADaM Derivations with sasif

Introduction

Clinical programmers working in R often face a common challenge when migrating from SAS: in SAS, a single IF ... THEN DO block can assign multiple variables at once under one condition. In R, traditional approaches like case_when() or fifelse() force you to repeat the same condition for every variable — increasing QC risk and reducing readability.

sasif solves this by bringing SAS-style IF / ELSE IF / ELSE control flow into R’s data.table ecosystem. One condition governs all assignments in a block — just like SAS.

This vignette walks through three real-world ADaM derivation scenarios:

  1. ADSL — Population flags and treatment variables
  2. ADLB — Laboratory value categorisation
  3. ADAE — Treatment-emergent adverse event flags

Setup

library(sasif)
library(data.table)
#> Warning: package 'data.table' was built under R version 4.5.2

Scenario 1 — ADSL: Population Flags

The Problem

In a typical ADSL derivation, when a subject is in the treatment arm, multiple variables need to be assigned simultaneously — population flags, treatment labels, numeric codes, and treatment dates.

In traditional R, every variable requires its own repeated condition:

# ❌ Traditional R — condition repeated for every variable
adsl <- adsl %>% mutate(
  SAFFL   = case_when(ACTARMCD == "TRTA" ~ "Y"),
  SAFFLN  = case_when(ACTARMCD == "TRTA" ~ 1),
  TRT01A  = case_when(ACTARMCD == "TRTA" ~ ACTARMCD),
  TRT01AN = case_when(ACTARMCD == "TRTA" ~ 1),
  ITTFL   = case_when(ACTARMCD == "TRTA" ~ "Y"),
  FASFL   = case_when(ACTARMCD == "TRTA" ~ "Y"),
  RANDFL  = case_when(ACTARMCD == "TRTA" ~ "Y"),
  PPFL    = case_when(ACTARMCD == "TRTA" ~ "Y")
  # Same condition written 8 times — high QC risk
)

If the condition ever changes, you must update it in 8 places. Miss one and your derivation silently diverges — a real risk in regulated environments.

The sasif Solution

# Create sample ADSL data
adsl <- data.table(
  USUBJID  = c("S01", "S02", "S03", "S04"),
  ACTARMCD = c("TRTA", "TRTA", "SCRNFAIL", "TRTA"),
  RFSTDTC  = c("2024-01-10", "2024-01-15", NA, "2024-01-20"),
  RFENDTC  = c("2024-06-10", "2024-06-15", NA, "2024-06-20")
)

# ✅ sasif — condition written ONCE, governs all assignments
ADSL <- data_step(adsl,
  if_do(ACTARMCD == "TRTA",
    SAFFL   = "Y",
    SAFFLN  = 1,
    TRT01A  = "Treatment A",
    TRT01AN = 1,
    TRTSDT  = as.Date(RFSTDTC, "%Y-%m-%d"),
    TRTEDT  = as.Date(RFENDTC, "%Y-%m-%d"),
    ITTFL   = "Y",
    FASFL   = "Y",
    RANDFL  = "Y",
    PPFL    = "Y"
  )
)

print(ADSL[, .(USUBJID, ACTARMCD, SAFFL, TRT01A, TRT01AN, ITTFL, FASFL)])
#>    USUBJID ACTARMCD  SAFFL      TRT01A TRT01AN  ITTFL  FASFL
#>     <char>   <char> <char>      <char>   <num> <char> <char>
#> 1:     S01     TRTA      Y Treatment A       1      Y      Y
#> 2:     S02     TRTA      Y Treatment A       1      Y      Y
#> 3:     S03 SCRNFAIL   <NA>        <NA>      NA   <NA>   <NA>
#> 4:     S04     TRTA      Y Treatment A       1      Y      Y

All 10 variables are derived from a single condition block. Clean, readable, and audit-friendly — exactly like SAS IF ... THEN DO.


Scenario 2 — ADSL: Multi-Arm Treatment Assignment (IF / ELSE IF / ELSE)

When a study has multiple treatment arms, use the full IF / ELSE IF / ELSE chain. The first matching condition wins — all others are skipped:

adsl2 <- data.table(
  USUBJID  = c("S01", "S02", "S03", "S04", "S05"),
  ACTARMCD = c("TRTA", "TRTB", "TRTC", "TRTA", "TRTB"),
  AGE      = c(35, 52, 67, 44, 58)
)

ADSL2 <- data_step(adsl2,
  if_do(ACTARMCD == "TRTA",
    TRT01A  = "Treatment A",
    TRT01AN = 1
  ),
  else_if_do(ACTARMCD == "TRTB",
    TRT01A  = "Treatment B",
    TRT01AN = 2
  ),
  else_do(
    TRT01A  = "Placebo",
    TRT01AN = 99
  )
)

print(ADSL2[, .(USUBJID, ACTARMCD, TRT01A, TRT01AN)])
#>    USUBJID ACTARMCD      TRT01A TRT01AN
#>     <char>   <char>      <char>   <num>
#> 1:     S01     TRTA Treatment A       1
#> 2:     S02     TRTB Treatment B       2
#> 3:     S03     TRTC     Placebo      99
#> 4:     S04     TRTA Treatment A       1
#> 5:     S05     TRTB Treatment B       2

Notice that both TRT01A (character label) and TRT01AN (numeric code) are derived together under each condition — no repetition needed.


Scenario 3 — ADSL: Age Categorisation

Derive both the age category label and its numeric code in one chain:

adsl3 <- data.table(
  USUBJID = c("S01", "S02", "S03", "S04", "S05"),
  AGE     = c(32, 45, 58, 71, 80)
)

ADSL3 <- data_step(adsl3,
  if_do(AGE <= 45,
    AGECAT  = "YOUNG",
    AGECATN = 1
  ),
  else_if_do(AGE <= 70,
    AGECAT  = "MIDDLE",
    AGECATN = 2
  ),
  else_do(
    AGECAT  = "OLD",
    AGECATN = 3
  )
)

print(ADSL3[, .(USUBJID, AGE, AGECAT, AGECATN)])
#>    USUBJID   AGE AGECAT AGECATN
#>     <char> <num> <char>   <num>
#> 1:     S01    32  YOUNG       1
#> 2:     S02    45  YOUNG       1
#> 3:     S03    58 MIDDLE       2
#> 4:     S04    71    OLD       3
#> 5:     S05    80    OLD       3

Scenario 4 — ADLB: Laboratory Value Categorisation

A common ADaM derivation — categorise lab values as LOW, NORMAL, or HIGH based on reference ranges, and derive both the character and numeric category together:

adlb <- data.table(
  USUBJID  = c("S01", "S01", "S02", "S02", "S03"),
  LBTESTCD = c("ALB", "ALB", "ALB", "ALB", "ALB"),
  AVAL     = c(2.8, 4.2, 5.6, 3.5, 1.9),
  ANRLO    = c(3.5, 3.5, 3.5, 3.5, 3.5),
  ANRHI    = c(5.0, 5.0, 5.0, 5.0, 5.0)
)

ADLB <- data_step(adlb,
  if_do(LBTESTCD == "ALB" & AVAL < ANRLO,
    ALBCAT  = "LOW",
    ALBCATN = 1
  ),
  else_if_do(LBTESTCD == "ALB" & AVAL > ANRHI,
    ALBCAT  = "HIGH",
    ALBCATN = 2
  ),
  else_do(
    ALBCAT  = "NORMAL",
    ALBCATN = 3
  )
)

print(ADLB[, .(USUBJID, LBTESTCD, AVAL, ANRLO, ANRHI, ALBCAT, ALBCATN)])
#>    USUBJID LBTESTCD  AVAL ANRLO ANRHI ALBCAT ALBCATN
#>     <char>   <char> <num> <num> <num> <char>   <num>
#> 1:     S01      ALB   2.8   3.5     5    LOW       1
#> 2:     S01      ALB   4.2   3.5     5 NORMAL       3
#> 3:     S02      ALB   5.6   3.5     5   HIGH       2
#> 4:     S02      ALB   3.5   3.5     5 NORMAL       3
#> 5:     S03      ALB   1.9   3.5     5    LOW       1

Both ALBCAT and ALBCATN are always consistent — they are derived from the same condition, so they can never diverge.


Scenario 5 — ADAE: Treatment-Emergent Flag (TRTEMFL)

Flag adverse events that started on or after the treatment start date:

adae <- data.table(
  USUBJID = c("S01", "S01", "S02", "S02", "S03"),
  AEDECOD = c("Headache", "Nausea", "Fatigue", "Dizziness", "Rash"),
  ASTDT   = as.Date(c("2024-01-15", "2023-12-01",
                       "2024-01-20", "2024-02-10", "2024-01-25")),
  TRTSDT  = as.Date(c("2024-01-10", "2024-01-10",
                       "2024-01-15", "2024-01-15", "2024-01-20")),
  TRTEDT  = as.Date(c("2024-06-10", "2024-06-10",
                       "2024-06-15", "2024-06-15", "2024-06-20"))
)

ADAE <- data_step(adae,
  if_do(ASTDT >= TRTSDT & ASTDT <= TRTEDT,
    TRTEMFL = "Y",
    TRTEMA  = AEDECOD
  )
)

print(ADAE[, .(USUBJID, AEDECOD, ASTDT, TRTSDT, TRTEMFL)])
#>    USUBJID   AEDECOD      ASTDT     TRTSDT TRTEMFL
#>     <char>    <char>     <Date>     <Date>  <char>
#> 1:     S01  Headache 2024-01-15 2024-01-10       Y
#> 2:     S01    Nausea 2023-12-01 2024-01-10    <NA>
#> 3:     S02   Fatigue 2024-01-20 2024-01-15       Y
#> 4:     S02 Dizziness 2024-02-10 2024-01-15       Y
#> 5:     S03      Rash 2024-01-25 2024-01-20       Y

Scenario 6 — DELETE: Remove Unwanted Records

Use delete_if() to remove rows explicitly — mirrors the SAS DELETE statement and makes the intent clear in the code:

adlb2 <- data.table(
  USUBJID  = c("S01", "S02", "S03", "S04", "S05"),
  LBTESTCD = c("ALB", NA,    "ALB", "ALB", NA),
  VISIT    = c("WEEK 1", "WEEK 1", "UNSCHEDULED", "WEEK 2", "WEEK 4"),
  AVAL     = c(4.2, 3.8, 5.1, 4.0, 3.5)
)

ADLB2 <- data_step(adlb2,
  delete_if(is.na(LBTESTCD)),
  delete_if(VISIT == "UNSCHEDULED")
)

print(ADLB2)
#>    USUBJID LBTESTCD  VISIT  AVAL
#>     <char>   <char> <char> <num>
#> 1:     S01      ALB WEEK 1   4.2
#> 2:     S04      ALB WEEK 2   4.0

Only records with valid test codes and scheduled visits are retained.


Scenario 7 — Independent Flags (if_independent)

Use if_independent() when conditions are not mutually exclusive — each condition is evaluated on its own, so multiple flags can apply to the same row simultaneously:

adsl4 <- data.table(
  USUBJID = c("S01", "S02", "S03", "S04"),
  AGE     = c(30, 68, 45, 72),
  WEIGHTKG = c(48, 72, 55, 43),
  DIABFL  = c("N", "Y", "N", "Y")
)

ADSL4 <- data_step(adsl4,
  if_independent(AGE > 65,       SENIORFL  = "Y"),
  if_independent(WEIGHTKG < 50,  LOWWTFL   = "Y"),
  if_independent(DIABFL == "Y",  COMORBFL  = "Y")
)

print(ADSL4)
#>    USUBJID   AGE WEIGHTKG DIABFL SENIORFL LOWWTFL COMORBFL
#>     <char> <num>    <num> <char>   <char>  <char>   <char>
#> 1:     S01    30       48      N     <NA>       Y     <NA>
#> 2:     S02    68       72      Y        Y    <NA>        Y
#> 3:     S03    45       55      N     <NA>    <NA>     <NA>
#> 4:     S04    72       43      Y        Y       Y        Y

Subject S04 (age 72, weight 43, diabetic) receives all three flags — because all three conditions are TRUE for that row simultaneously.


Key Principle: When to Use Which Function

Situation Use
First matching condition should win if_do() + else_if_do() + else_do()
Multiple conditions can apply to same row if_independent()
Remove rows from dataset delete_if()

Important: Do not mix if_do() chains with if_independent() on the same variable. if_independent() runs after the chain and will overwrite earlier assignments. Use one approach consistently per variable.


Summary

sasif brings three key benefits to clinical R programming:

For more information, see the package documentation.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.