OpenPayments Data #3: Subsidiary Filings Prt. 2

First, some context:

The requirement for direct-to-doctor (D2D) marketing transparency was enacted by Physician Payment Sunshine Act as part of ACA reforms in 2010. The call for transparency was initiated in response to concerns that D2D marketing practices by pharmaceutical & medical device companies may exert on undue influence on physician prescription practices.

These concerns can be illustrated using an extremely over-simplified example:
Drug A and drug B are used to treat circusitis (list of fake diseases, here). The drugs are almost identical - they have very similar methods of action and clinical trials have shown nearly identical patient response rates and side-effects, but they are made by different manufactures. The typical cost of a weekly regimen of drug B is $300, while drug A typically costs $75 per week.

Due to the price differences and drug similarities, it would be expected that physicians prescribe drug A in almost all cases.

However, Dr. Smith has been wined and dined by the manufacturer of drug B on a number of occasions and, as a result, he unconsciously developed a prescription preference for drug B – needlessly causing the parents of his young patients to pay an additional $225 per week.
Although the example is an exaggeration, it is a very real patient concern – especially for patients suffering from chronic incurable diseases that may have multiple treatment options. The idea that your doctor may have derived value D2D marketing for a your prescribed drug may cause you to question basis of their recommendation. Is it really your best option, or has undue influence by the drug maker caused doctor prescription behavior to change?

As noted in my OPD#1 post, D2D marketing can take many forms and the typical value derived by physicians can vary significantly based on the type of activity. Without appropriate context, it is difficult for consumers judge whether the marketing activities were benign and truly informational or whether they crossed the line of acceptable behavior by the manufacturer.

Although D2D marketing analysis can be viewed entirely from the perspective of patient impact, there are serious implications product manufacturers, as well.

Pharmaceutical and medical device marketing practices are heavily regulated by the federal government and improper activities can be result in heavy fines & corporate integrity agreements with the Office of the Inspector General that may limit M&A and other key business activities until the problem has been sufficiently remediated.

This, too, can be illustrated using a over-simplified example:
Company A is a major drug manufacturer seeking to expand it's product portfolio. It's product management team noticed that a small start-up recently developed and began selling a cure to a debilitating disease.

Seeking to corner the market and expand the availability of the medication, Company A made a rapid acquisition of the start-up firm. Unfortunately, Company A had a fairly lax due-diligence process and failed to notice the abnormally high and questionable payments the start-up made to key physicians for speaking events when it first brought its product to market .

The start-up became a subsidiary of Company A and continued to operate with a fair level of autonomy. It noticed the massive sales returns on its initial doctor led speaking events and decided to continue the practice. Before they knew it, speaking events were used to discuss numerous off-label uses for the medication.

The DOJ eventually noticed and reprimanded Company A for failure to meet due-diligence standards and for the start-up's irresponsible marketing practices.

If only company A had noticed irregular marketing activities and taken steps to investigate them and implement appropriate controls to prevent off-label marketing.

Although this example is also entirely fictional, E. Erdos (Principal EY, Life Sciences Fraud Investigation) noted the following:

“It is possible that Open Payments data could be leveraged by government agencies, such as the Department of Justice and state Attorneys General offices, during enforcement actions by these agencies…[Companies] need to understand what the data is saying about their practices regarding interactions with physicians and teaching hospitals, and use this data as a means to assess compliance controls associated with these practices” (pharmexec).

Now, the meat:

It’s impossible to assess compliance controls without first understanding the variability in D2D marketing activities. This variability is not limited to individual companies and the market, there is a substantial amount of variability between parent companies and their subsidiaries.

Variation can be benign – perhaps a subsidiary makes large royalty payments to doctors that helped develop the product relative the parent organization and it’s other subsidiaries. However, the root-cause of the variation cannot be assessed if it is not first identified.

Leading off my last post, subsidiary-parent connections can be mapped based on internal payer-filer relationships present in the CMS data. These relationships can be further refined using 10-K SEC filings for public companies.

To illustrate the variation in marketing behavior between a parent company and it’s various subsidiaries, I have plotted payment distributions subsidiary companies and contrasted them with the overall organizational distribution.

Payments were aggregated at the doctor level (probability of a doctor receiving X dollars of value or payment in a given year). The gamma distribution was calculated on log-transformed sums and plotted using ggplot.

To reiterate the point of my previous post, parent-subsidiary mapping is essential for accurate marketing data analysis. Genentech is reported by both CMS and ProPublica as having the highest level of D2D marketing between 2013 and 2016. However, raw CMS payments data only allows aggregation at the payer level. Therefore, their statements of activity levels do not include payments made by various Genentech subsidiaries.

Hope you enjoy these pretty plots ~



# Start-up ----
rm(list = ls()); gc()

# Import Libararies

# Import Data
opdat = readRDS("Payment Data/GNRL_PGYR1316_US_PHYS.rds") # Primary Payments Table
corel = readRDS("Company Data/coRel02_small.RDS")
corel.parent = count(corel$ultParent)
corel.parent = corel.parent[order(corel.parent$freq, decreasing = TRUE),] # Count number of companies under each parent

# Data-prep -----
# Match over 'ultParent' to opdat
opdat$ultParent = corel$ultParent[match(opdat$Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_ID, corel$Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_ID)]

# Reference for Generating plots
# ...> Parent companies making payments through subsidiaries
target = as.character(corel.parent$x[corel.parent$freq >= 3]) # target companies with 3+ subsidiaries doing doctor-based promotion
# ...> 62 Parent comapnes

# Define Functions ----
scotts = function(x){
# Scott's normal reference rule calculation

n = length(x)
std = sd(x)

bw = (3.5*std)/n^(1/3)
return(round(bw, 2))

# Generate Plot
mind = 25 # 'min-doc' Cut-off for the number of doctors a company has to make payments to, to be included in visualizations

options(warn = -1)
for(i in 1:length(target)){
# Get Parent
parent = target[i]

# Count number of doctors each subsidiary has marketed to
opdat.sub = opdat[opdat$Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_ID %in% corel$Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_ID[corel$ultParent == parent],]
dcnt = unique(opdat.sub[,c("Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_ID","Physician_Profile_ID")])
dcnt = count(dcnt$Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_ID)
dcnt = dcnt[dcnt$freq >= mind,] # apply doctor count cut-off
dcnt$Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name = corel$Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name[match(dcnt$x, corel$Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_ID)]

if(dim(dcnt)[1] < 3){
# Skip iteration if at least 3 subsidiaries don't meet doctor count cut-off

# Readjust population of subsidary transactions to companies that made the cut-off
opdat.sub = opdat.sub[opdat.sub$Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_ID %in% dcnt$x, ]

# Perform gamma-dist fit for parent
parent.agg = aggregate(opdat.sub$Total_Amount_of_Payment_USDollars,
by = list(opdat.sub$Physician_Profile_ID), FUN = sum); names(parent.agg) = c("Physician_ID", "Sum")
parent.agg$Sum = log(parent.agg$Sum/3.5) # Annually adjusted log-form, 3.5 years b/c 2013 has only .5yr of data = fitdistr(parent.agg$Sum[parent.agg$Sum > 0], densfun = "gamma")

# Perform gamma-dist fits for subsidiaries = vector(mode = "list", length = dim(dcnt)[1])
for(j in 1:nrow(dcnt)){
# Aggregate subsidiary transactions
sub.txn = opdat.sub[opdat.sub$Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_ID == dcnt$x[j],]
sub.agg = aggregate(sub.txn$Total_Amount_of_Payment_USDollars, by = list(sub.txn$Physician_Profile_ID), FUN = sum)
sub.agg$x = log(sub.agg$x/3.5); rm(sub.txn) # Annually adjusted log-form[[j]] = fitdistr(sub.agg$x[sub.agg$x > 0], densfun = "gamma")
names( = as.character(dcnt$Applicable_Manufacturer_or_Applicable_GPO_Making_Payment_Name)

# Create base-line plot object using parent data
# Pre-define plot traits
bw = scotts(parent.agg$Sum) # bin-width calculation form breaks
p.breaks = seq(min(parent.agg$Sum[parent.agg$Sum > 0]),max(parent.agg$Sum), by = bw) # Points to calculate plot distribution over, base for labels
# Force resizing of bins if 'bw' is too small for good visualization
if(length(p.breaks) > 15){
p.breaks = p.breaks[seq(1, length(p.breaks), by = floor(length(p.breaks)/15))]
bw = bw * floor(length(p.breaks)/15)

p.labels = paste("$", prettyNum(round(exp(p.breaks), 0), big.mark = ","), sep = "")
biglabs = which(nchar(gsub("[^[:digit:]]","",p.labels)) > 5);
p.labels[biglabs] = paste("$", prettyNum(round(as.numeric(gsub("[^[:digit:]]", "", p.labels[biglabs]))/1e3), big.mark = ","),"K",sep = "")

p.title = list(parent,
element_text(hjust = 0, size = 15, colour = "#0072B2"))

p.subtitle = list("Parent company distribution in PURPLE, subsidiary distributions in BLUE.",
element_text(size = 10)) = paste("Plots and Analyses/Payment Behavior by Parent Company/",
str_pad(as.character(i), width = 3, side = "left", pad = "0"),"_",
gsub("[^[:alnum:]]{1,}","_",substr(parent, 1, 20)),".png",sep = "")

# Use gamma-distribution parameters to create curve values
k = 10 # Point-density multipler, controls line smoothness
val = seq(min(parent.agg$Sum[parent.agg$Sum > 0]),max(parent.agg$Sum), by = (bw/k))
p.dat =, nrow = length(val), ncol = (dim(dcnt)[1] + 2) ))
names(p.dat) = c("val","parent",paste("sub.",as.character(seq(1,dim(dcnt)[1])), sep = ""))
p.dat$val = val
p.dat$parent = dgamma(p.dat$val, shape =[[1]][1], rate =[[1]][2]) #Calculate parent distribution point-values
for(j in 3:dim(p.dat)[2]){
# Calculate subsidiary distributions point-values
p.dat[,j] = dgamma(p.dat$val, shape =[[j-2]]$estimate[1], rate =[[j-2]]$estimate[2])
p.dat = melt(p.dat, id = "val")

# # Plot with parent-plot on BOTTOM
# p = ggplot(data = p.dat[p.dat$variable == "parent",], aes(x = val, y = value)) +
# scale_x_continuous(name = "Marketing Payments to Doctors [2013-2016]",
# breaks = p.breaks, labels = p.labels) +
# ylab("Distribution Density") +
# labs(title = p.title[[1]], subtitle = p.subtitle[[1]]) +
# theme(plot.title = p.title[[2]], plot.subtitle = p.subtitle[[2]],
# legend.position = "none") +
# geom_line(colour = "#CC79A7", size = 3, alpha = 1)
# p + geom_line(data = p.dat[p.dat$variable != "parent",], aes(x = val, y = value, colour = variable),
# size = 1, alpha = .2, linetype = 1) +
# scale_color_manual(values = c(rep("#56B4E9", dim(dcnt)[1])))

# Plot with parent-plot on TOP
p = ggplot(data = p.dat[p.dat$variable != "parent",], aes(x = val, y = value, colour = variable)) +
geom_line(size = 1, alpha = .2, linetype = 1) +
scale_color_manual(values = c(rep("#56B4E9", dim(dcnt)[1])))

p + geom_line(data = p.dat[p.dat$variable == "parent",], aes(x = val, y = value),
colour = "#CC79A7", size = 3, alpha = 1) +
scale_x_continuous(name = "Average Annual Directo-to-Doctor Marketing [2013-2016]",
breaks = p.breaks, labels = p.labels) +
ylab("Probability Density") +
labs(title = p.title[[1]], subtitle = p.subtitle[[1]],
caption = "Annually adjusted by 3.5 yrs (2013 has only 6mos of data); Aggregated at per-doctor level.") +
theme(plot.title = p.title[[2]], plot.subtitle = p.subtitle[[2]],
legend.position = "none")

# Save Plot
ggsave(filename =, plot = last_plot(), width = 15, height = 7)


Future Work (?):

Here’s what to expect over the next couple months:

  1. Visualizations without typos! Sorry, the script takes a while to run – waiting for the plots to be regenerated, synced over to my locally and re-uploaded on wordpress would mean I wouldn’t get this post done tonight.Given my schedule lately, the I probably wouldn’t have time to wrap up publishing for another few days.Either way, I’ll pay more attention to labels and RStudio should introduce a native typo highlighting feature for developer defined strings.
  2. Code mark-downs, so it’s easier to read.
  3. More complex analysis:
    One of the things in the works is a network analysis of major companies based on their marketing behavior using non-parametric classification methods.
    “…What method to use? …What factors to consider?”, these are questions I think about too often these days.Other upcoming visualizations include distribution plots by pharmaceutical classes and product life-cycle stage.
  4. An app? — I have harped pretty hard on the lack of contextual information in the current CMS & ProPublica OpenPayments apps. How are patients supposed to differentiate benign activity from true outliers without an understanding of what is normal practice for a doctor with a specific specialty, in a specific region, for a specific type of drug, etc.?These patient-focused models would then be extended and expanded to manufacturer and hospital centric risk assessment models, so those parties could evaluate and explore their exposure to abnormal activity.
  5. More music videos! Because there is no better way to close out a long post.
    On that note, guess who has tickets to see Philip Glass perform in Carnegie Hall? This guy!
    Thank the lord for small miracles, like being only a few hours away from the city the when Glass is selected as the Debs Composer of the year.Full schedule of performances of his work, here. It’s going to be a wonderful year.

And now, a better Four Seasons.




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s