Wiekvoet: April 2015

Saturday, April 25, 2015

Unemployment of Europe in 2014 by NUTS 2 region

During the Christmas break I worked on some code to show unemployment by NUTS 2 region. At that point no 2014 data was available. When I noticed the 214 was available I dug up the code and plotted again.

Data and Code

As written, the code was made beginning this year. At that point it seemed Eurostat's directories had all moved and hence SmarterPoland was not used. Consequently, it is not used here either. However it seems to be much more easily than doing the data extraction work myself.
The mapping is created based on code found at rpubs.com: Mapping Data from Eurostat using R. The main change I made was using as continuous a scale as possible. I wanted to see details in southern Germany, but also let the scale show details in the worst regions. It appears not Greece but southern Spain has the worst unemployment. But since Spain has some better regions overall as country it seems a bit better off than Greece. The color then, are a trade off between red is bad, green is good and an ordering where colors are unique. Hence orange sits between yellow and blue to avoid yellow and blue mixing to deliver green again. I have the nagging feeling that it will be horrific for colorblind people.
In addition, I removed France's overseas parts and added a legend.
library("rgdal")
library("RColorBrewer")
library("classInt")
#library("SmarterPoland")
library(fields)

# create a new empty object called 'temp' in which to store a zip file
# containing boundary data
# temp <- tempfile(fileext = ".zip")
# now download the zip file from its location on the Eurostat website and
# put it into the temp object
# new Eurostat website
# old: http://epp.eurostat.ec.europa.eu
# new: http://ec.europa.eu/eurostat
#
# download.file(
# "http://ec.europa.eu/eurostat/cache/GISCO/geodatafiles/NUTS_2010_60M_SH.zip",temp)
# now unzip the boundary data
# unzip(temp)

EU_NUTS <- readOGR(dsn = "./NUTS_2010_60M_SH/data", layer = "NUTS_RG_60M_2010")
ToRemove <- EU_NUTS@data$STAT_LEVL!=2 | grepl('FR9',EU_NUTS@data$NUTS_ID)
EUN <- EU_NUTS[!ToRemove,]

## OGR data source with driver: ESRI Shapefile
## Source: "./NUTS_2010_60M_SH/data", layer: "NUTS_RG_60M_2010"
## with 1920 features and 4 fields
## Feature type: wkbPolygon with 2 dimensions
#plot(EU_NUTS)

myunempl <- read.csv('lfst_r_lfu3rt.csv',na=':',skip=10)

#EU_NUTS <- spTransform(EU_NUTS, CRS("+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +no_defs"))

EUN@data = data.frame(EUN@data[,1:4], myunempl[
match(EUN@data[, "NUTS_ID"],myunempl[, "GEO.TIME"]), ])

EUN <- EUN[!is.na(EUN@data$X2014),]

plot <- plot(EUN, col = rgb(colorRamp(
c('darkGreen','springgreen','Yellow','Orange',
'Dark Blue','Light Blue','Purple','Red'))
((EUN@data$X2014-min(EUN@data$X2014))/
(max(EUN@data$X2014)-min(EUN@data$X2014)))/255),
axes = FALSE, border = NA)

image.plot(add=TRUE,
zlim=c(min(EUN@data$X2014),max(EUN@data$X2014)),
col=rgb(colorRamp(
c('darkGreen','springgreen','Yellow','Orange',
'Dark Blue','Light Blue','Purple','Red'))(seq(0,1,length.out=50))/255),
legend.only=TRUE,
smallplot=c(.02,.06,.15,.85))

Sunday, April 19, 2015

PK analysis: Theoph again

This week I wanted to repeat the Theoph PK analysis of two weeks ago in Stan. It also suddenly dawned on me. For a non-linear model in classical statistics we think it normal to provide good initializations. In contrast, for those MCMC samples I remember reading to start with highly dispersed initial values. So, for non-linear models in Bayesian context, we got two contrasting views on initialization. From my experience, I probably should abandon those highly dispersed initializations as they do not lead to convergence.

Model and data

The data is the same as two weeks ago. The only difference is that I realized that a dose in mg/kg should be converted to total dose in order to get a sensible idea of clearance (CL). The model then, is a slightly modified version, and runs through Stan.
The problem with this model is not so much the formulation, but rather, as indicated, the initialization. En even then, it seems to be getting unacceptable samples at a high rate. I don't think this is a critique of the model. The estimates seem quite reasonable.

library(MEMSS)
library(ggplot2)
library(rstan)

Theoph$dose <- Theoph$Dose*Theoph$Wt
datain <- list(
    Time=Theoph$Time,
    conc=Theoph$conc,
    n=nrow(Theoph),
    subject =c(1:nlevels(Theoph$Subject))[Theoph$Subject],
    nsubject = nlevels(Theoph$Subject),
    dose=unique(subset(Theoph,, c(Subject,dose)))$dose
)

smodel <- '
data {
    int <lower=0> n;
    int <lower=0> nsubject;
    int <lower=1,upper=nsubject> subject[n];
    real dose[nsubject];
    real Time[n];
    real conc[n];
}
parameters {
    real <lower=0> sdr;
    real <lower=0> kas[nsubject];
    real <lower=0> kes[nsubject];
    real <lower=0> CLs[nsubject];
    real lke;
    real lka;
    real <lower=0> CL;
    real kesd;
    real kasd;
    real CLsd;
}
transformed parameters {
    real pred[n];
    real <lower=0> c0star[nsubject];
    for (i in 1:nsubject) {
       c0star[i] <- dose[i]*((kes[i]*kas[i])/CLs[i])/
      (kas[i]-kes[i]);
    }
    for (i in 1:n) {
       pred[i] <- c0star[subject[i]]*
          (exp(-kes[subject[i]]*Time[i])-exp(-kas[subject[i]]*Time[i]));
    }
}
model {
    conc ~ normal(pred,sdr);
    kes ~ lognormal(lke,kesd);
    kas ~ lognormal(lka,kasd);
    lke ~ uniform(-3,3);
    lka ~ uniform(-3,3);
    CL ~ uniform(0.01,300);
    CLs ~ normal(CL,CLsd);
    kesd ~ uniform(0.01,2);
    kasd ~ uniform(0.01,2);
    CLsd ~ uniform(0.01,10);
}
generated quantities {
    real Ka;
    real Ke;
    Ka <- exp(lka);
    Ke <- exp(lke);
}'
# (kes[i]<kas[i])*
fstan <- stan(model_code = smodel,
    data = datain,
    pars=c('CL','kes','kas','CLs',
        'c0star','Ka','Ke'),
    init= function() {
      list(lka = rnorm(1,log(2),.3),
          kas=   rnorm(12,2,.2),
          lke=   rnorm(1,log(.08),.01),
          kes=   rnorm(12,.08,.01),
          CLs = rnorm(12,1,.1),
          CL = rnorm(1,1,.1),
          kesd=runif(1,0.04,0.1),
          kasd=runif(1,0.2,2),
          sdr= runif(1,1,3),
          CLsd=runif(1,.1,1)
      )})
fstan
Inference for Stan model: smodel.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean    sd 2.5%   25%   50%   75% 97.5% n_eff Rhat
CL          2.80    0.01 0.20 2.41 2.66 2.80 2.92 3.23   990 1.01
kes[1]      0.08    0.00 0.01 0.06 0.07 0.08 0.09 0.09   324 1.01
kes[2]      0.08    0.00 0.01 0.07 0.08 0.09 0.09 0.10   904 1.01
kes[3]      0.09    0.00 0.01 0.07 0.08 0.09 0.09 0.10 1314 1.00
kes[4]      0.09    0.00 0.01 0.08 0.09 0.09 0.10 0.12   675 1.01
kes[5]      0.09    0.00 0.01 0.08 0.08 0.09 0.09 0.11   995 1.00
kes[6]      0.09    0.00 0.01 0.07 0.08 0.09 0.09 0.10   914 1.00
kes[7]      0.09    0.00 0.01 0.07 0.08 0.09 0.09 0.10 1231 1.00
kes[8]      0.09    0.00 0.01 0.07 0.08 0.09 0.09 0.11 1326 1.01
kes[9]      0.08    0.00 0.01 0.07 0.08 0.08 0.09 0.10 1085 1.01
kes[10]     0.09    0.00 0.01 0.07 0.08 0.09 0.09 0.11 1129 1.00
kes[11]     0.09    0.00 0.01 0.07 0.08 0.09 0.09 0.11 1044 1.01
kes[12]     0.09    0.00 0.01 0.07 0.08 0.09 0.09 0.10   835 1.01
kas[1]      1.48    0.01 0.22 1.08 1.32 1.46 1.61 1.94   338 1.01
kas[2]      0.66    0.00 0.09 0.51 0.60 0.66 0.72 0.86 1262 1.00
kas[3]      4.00    0.03 0.90 2.67 3.40 3.87 4.41 6.16 1214 1.00
kas[4]      0.95    0.00 0.12 0.73 0.87 0.94 1.03 1.21 1476 1.00
kas[5]      2.12    0.01 0.32 1.56 1.90 2.10 2.34 2.78 1021 1.01
kas[6]      2.41    0.02 0.47 1.58 2.08 2.37 2.67 3.42   570 1.00
kas[7]      1.21    0.00 0.17 0.90 1.10 1.20 1.31 1.58 1277 1.00
kas[8]      1.50    0.01 0.18 1.17 1.39 1.50 1.61 1.87   739 1.01
kas[9]      1.27    0.01 0.24 0.88 1.10 1.26 1.41 1.81 1108 1.00
kas[10]     0.80    0.00 0.13 0.57 0.71 0.79 0.88 1.10 1264 1.00
kas[11]     1.45    0.01 0.24 1.03 1.27 1.43 1.60 1.98 1243 1.01
kas[12]     8.14    0.13 4.24 4.31 5.79 7.22 8.84 18.78 1001 1.00
CLs[1]      2.07    0.01 0.17 1.71 1.97 2.09 2.17 2.36   462 1.01
CLs[2]      2.07    0.00 0.13 1.80 1.98 2.06 2.15 2.34 1187 1.00
CLs[3]      3.38    0.01 0.25 2.91 3.21 3.38 3.54 3.91   860 1.00
CLs[4]      2.37    0.01 0.16 2.09 2.26 2.35 2.46 2.74   635 1.00
CLs[5]      2.99    0.01 0.22 2.61 2.84 2.97 3.11 3.46   819 1.00
CLs[6]      2.88    0.01 0.21 2.48 2.75 2.87 3.01 3.32   785 1.00
CLs[7]      2.73    0.01 0.19 2.37 2.60 2.73 2.84 3.11 1020 1.01
CLs[8]      2.39    0.01 0.16 2.09 2.28 2.38 2.49 2.73   680 1.02
CLs[9]      3.60    0.01 0.30 3.04 3.40 3.59 3.78 4.23 1085 1.00
CLs[10]     3.07    0.01 0.23 2.62 2.92 3.06 3.21 3.57   810 1.01
CLs[11]     3.15    0.01 0.23 2.70 2.99 3.15 3.30 3.62   944 1.01
CLs[12]     2.82    0.01 0.21 2.42 2.69 2.81 2.95 3.27   889 1.00
c0star[1] 12.85    0.07 0.84 11.26 12.26 12.84 13.45 14.45   148 1.01
c0star[2] 15.18    0.04 1.36 12.74 14.28 15.15 15.93 18.21 1225 1.00
c0star[3]   8.43    0.01 0.49 7.52 8.10 8.40 8.74 9.49 1415 1.01
c0star[4] 13.76    0.04 1.15 11.92 13.02 13.58 14.30 16.57   873 1.00
c0star[5]   9.96    0.02 0.62 8.78 9.57 9.91 10.32 11.32 1128 1.00
c0star[6]   9.87    0.03 0.63 8.73 9.43 9.85 10.27 11.25   603 1.00
c0star[7] 10.99    0.02 0.76 9.64 10.50 10.93 11.45 12.57 1373 1.00
c0star[8] 12.54    0.03 0.74 11.20 12.02 12.50 12.99 14.11   720 1.00
c0star[9]   8.08    0.02 0.68 6.82 7.62 8.06 8.46 9.45   793 1.01
c0star[10] 10.24    0.02 1.01 8.53 9.57 10.19 10.77 12.61 1730 1.00
c0star[11] 9.37    0.02 0.69 8.13 8.92 9.32 9.76 10.87 1187 1.00
c0star[12] 8.40    0.01 0.40 7.63 8.14 8.39 8.65 9.22 1757 1.00
Ka          1.63    0.02 0.42 0.95 1.34 1.58 1.87 2.57   516 1.01
Ke          0.09    0.00 0.01 0.08 0.08 0.09 0.09 0.10   516 1.02
lp__       17.34    1.53 10.27 0.41 10.66 16.13 22.63 45.26    45 1.07

Samples were drawn using NUTS(diag_e) at Sun Apr 19 09:57:07 2015.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Plot

This is just what I used last week, adapted for Stan. In addition, to get correct scale, mean dose in mg has been used in the calculations.

stansampls <- extract(fstan,c('CL','Ke','Ka'))
Time <- seq(1,max(datain$Time))

la2 <- sapply(1:nrow(stansampls$CL),
    function(i) {
      CL <- stansampls$CL[i]
      Ka <- stansampls$Ka[i]
      Ke <- stansampls$Ke[i]
      c0star <- mean(datain$dose)*((Ke*Ka)/CL)/(Ka-Ke)
      y<- c0star* (exp(-Ke*Time)-exp(-Ka*Time))
    })

res2 <- data.frame(
    Time=Time,
    Conc=apply(la2,1,mean),
    C05=apply(la2,1,FUN=function(x) quantile(x,0.05)),
    C95=apply(la2,1,FUN=function(x) quantile(x,0.95)))

ggplot(res2, aes(x=Time)) +
    geom_line(aes(y=Conc))+
    scale_y_log10('theophylline (mg/L)')+
    geom_line(aes(y=conc,x=Time,col=Subject),alpha=1,data=Theoph)+
    geom_ribbon(aes(ymin=C05, ymax=C95),alpha=.2)+
    theme(legend.position='bottom')

Sunday, April 12, 2015

Cefamandole: PK after IV with six subjects in JAGS

After last week's post on Theoph I noticed the MEMSS library contained more PK data sets. The Cefamandole data is an IV data set with six subjects. It is somewhat challenging, since there seem to be several elimination rates.

Data

Data are displayed below. Notice that the lines seem somewhat curved, there is clearly more going on that just simple elimination.

Models

The first model fits concentration as function of two eliminations. The problem here is that the eliminations need to be ordered. All faster eliminations should have one prior, all slower another. After much fiddling around I found that I cannot force the ordering at subject level with priors or limits. Hence I chose a different way out. The model gets an extra component, it will predict 0 if the parameters are incorrectly ordered. This indeed gave the desired result in terms of model behavior.

The model itself though, was less to my linking. It goes way too low at the later times. Upon consideration, since the model assumes homoscedastic error it does not matter to the likelihoood if the fit at the lower end is not so good. After all, an error of 1 is nothing compared to errors of 10 or 20 at the upper end of the scale. In addition it can be argued that the error is not homoscedastic. If it were homoscedastic, the underlying data would show way more variation at the lower end of the scale. It would need confirmation from the measuring team, but for now I will assume proportional error.

Model 2: Proportional error

The proportional error will be implemented by using a log normal distribution for concentration. This was pretty simple to code. The log of the expected concentration is used in combination with dlnorm(). The only issue remaining is that expectation 0 for incorrectly ordered parameters has to be shifted a bit, log(0) is a bit of an issue.

This model looks pretty decent. There are still some things to do, similar to last week, translate the parameters c1star and c2star into the parameters which can be interpreted from PK perspective.

Model outputs

Model 1

Inference for Bugs model at "C:/Users/Kees/AppData/Local/Temp/RtmpSe5nAn/modele207c187661.txt", fit using jags,
4 chains, each with 10000 iterations (first 5000 discarded), n.thin = 10
n.sims = 2000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
C[1] 91.473 24.508 51.690 76.444 88.708 103.600 144.987 1.016 160
C[2] 324.947 132.978 147.584 253.433 304.913 366.175 626.711 1.007 460
Ke[1] 0.020 0.002 0.016 0.018 0.019 0.021 0.024 1.045 76
Ke[2] 0.141 0.192 0.081 0.116 0.132 0.147 0.202 1.072 180
c1star[1] 58.783 8.381 43.892 53.222 58.514 63.406 76.601 1.047 64
c1star[2] 59.841 18.962 32.887 47.487 56.324 67.606 108.087 1.048 72
c1star[3] 92.337 10.569 70.647 85.720 92.805 99.469 112.338 1.041 74
c1star[4] 89.554 12.223 67.975 81.086 88.831 96.932 116.699 1.027 110
c1star[5] 146.419 21.504 106.395 132.153 145.199 159.149 193.558 1.032 84
c1star[6] 121.228 14.872 87.712 112.706 122.379 130.819 148.106 1.092 42
c2star[1] 426.241 107.777 278.572 355.121 406.845 471.583 703.121 1.038 92
c2star[2] 201.118 35.861 149.478 179.179 197.081 217.084 278.248 1.033 830
c2star[3] 489.294 218.162 268.493 349.208 419.178 537.325 1125.062 1.088 42
c2star[4] 449.402 68.164 340.627 402.410 440.666 487.652 603.535 1.038 75
c2star[5] 402.518 43.391 335.760 373.724 397.703 423.902 498.253 1.017 380
c2star[6] 140.893 46.737 79.548 114.343 133.240 157.967 237.080 1.024 650
ke1[1] 0.020 0.002 0.016 0.018 0.020 0.021 0.025 1.042 74
ke1[2] 0.021 0.004 0.016 0.019 0.020 0.023 0.032 1.060 57
ke1[3] 0.018 0.002 0.014 0.017 0.018 0.019 0.022 1.031 99
ke1[4] 0.020 0.002 0.017 0.019 0.020 0.021 0.025 1.023 120
ke1[5] 0.020 0.002 0.016 0.019 0.020 0.021 0.024 1.036 74
ke1[6] 0.019 0.002 0.015 0.018 0.019 0.020 0.022 1.071 46
ke2[1] 0.167 0.025 0.128 0.150 0.164 0.181 0.225 1.050 64
ke2[2] 0.101 0.026 0.067 0.085 0.096 0.111 0.164 1.050 91
ke2[3] 0.181 0.042 0.122 0.151 0.173 0.200 0.283 1.076 43
ke2[4] 0.143 0.019 0.110 0.130 0.141 0.155 0.181 1.040 74
ke2[5] 0.112 0.017 0.086 0.100 0.110 0.121 0.151 1.027 130
ke2[6] 0.118 0.038 0.062 0.093 0.114 0.137 0.205 1.051 61
deviance 464.708 7.996 451.264 458.919 464.029 469.608 482.206 1.006 550

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 31.8 and DIC = 496.6
DIC is an estimate of expected predictive error (lower deviance is better).

Model 2

Inference for Bugs model at "C:/Users/Kees/AppData/Local/Temp/RtmpSe5nAn/modele2045997fac.txt", fit using jags,

4 chains, each with 10000 iterations (first 5000 discarded), n.thin = 10

n.sims = 2000 iterations saved

mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff

C[1] 23.273 10.942 8.344 16.933 21.737 27.259 48.399 1.004 710

C[2] 159.545 32.361 106.916 141.343 157.165 172.618 230.957 1.004 1200

Ke[1] 0.010 0.001 0.009 0.010 0.010 0.011 0.012 1.010 270

Ke[2] 0.040 0.005 0.032 0.036 0.039 0.042 0.050 1.006 660

c1star[1] 14.141 3.164 8.481 11.970 13.923 16.049 20.826 1.012 230

c1star[2] 13.177 4.777 6.178 9.622 12.276 16.064 24.394 1.020 210

c1star[3] 34.030 6.876 20.963 29.102 33.778 38.458 47.854 1.006 450

c1star[4] 12.399 4.038 6.525 9.482 11.712 14.524 22.656 1.019 130

c1star[5] 40.359 8.756 24.398 34.248 40.002 45.836 58.609 1.007 360

c1star[6] 38.415 9.194 21.683 32.043 37.885 44.595 57.469 1.006 440

c2star[1] 123.292 17.872 93.250 110.418 121.625 134.114 164.930 1.000 2000

c2star[2] 134.174 18.465 103.312 121.677 132.580 145.038 176.803 1.004 680

c2star[3] 138.541 24.539 97.522 122.254 135.654 151.958 195.143 1.000 2000

c2star[4] 184.399 20.868 148.465 170.310 183.027 197.048 229.782 1.002 1500

c2star[5] 247.359 38.113 178.403 222.054 244.317 269.412 330.096 1.003 2000

c2star[6] 150.684 22.359 111.345 135.290 149.545 164.626 198.568 1.002 2000

ke1[1] 0.010 0.001 0.008 0.010 0.010 0.011 0.012 1.013 210

ke1[2] 0.012 0.001 0.009 0.011 0.011 0.012 0.014 1.022 180

ke1[3] 0.010 0.001 0.008 0.009 0.010 0.010 0.011 1.005 540

ke1[4] 0.011 0.001 0.009 0.010 0.010 0.011 0.013 1.020 130

ke1[5] 0.010 0.001 0.008 0.010 0.010 0.011 0.012 1.007 350

ke1[6] 0.010 0.001 0.009 0.010 0.010 0.011 0.012 1.005 540

ke2[1] 0.042 0.005 0.034 0.038 0.041 0.044 0.055 1.004 690

ke2[2] 0.041 0.006 0.032 0.037 0.040 0.044 0.056 1.013 230

ke2[3] 0.042 0.008 0.032 0.037 0.040 0.045 0.062 1.006 440

ke2[4] 0.037 0.003 0.031 0.034 0.036 0.038 0.044 1.008 320

ke2[5] 0.040 0.005 0.032 0.037 0.040 0.043 0.053 1.003 1100

ke2[6] 0.036 0.006 0.027 0.033 0.036 0.040 0.048 1.003 1200

deviance 372.564 7.382 359.811 367.354 372.140 376.777 388.833 1.001 2000

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)

pD = 27.3 and DIC = 399.8

DIC is an estimate of expected predictive error (lower deviance is better).

Code used

library(MEMSS)
library(ggplot2)
library(R2jags)
png('Cefamandole.png')
qplot(y=conc,x=Time,col=Subject,data=Cefamandole) +
scale_y_log10('cefamandole (mcg/ml)')+
geom_line()+
theme(legend.position='bottom')
dev.off()

library(R2jags)
datain <- list(
time=Cefamandole$Time,
conc=Cefamandole$conc,
n=nrow(Cefamandole),
subject =c(1:nlevels(Cefamandole$Subject))[Cefamandole$Subject],
nsubject = nlevels(Cefamandole$Subject)
)

model1 <- function() {
tau <- 1/pow(sigma,2)
sigma ~ dunif(0,100)
for (i in 1:n) {
pred[i] <- (ke1[subject[i]]<ke2[subject[i]])*
(c1star[subject[i]]*exp(-ke1[subject[i]]*time[i])+
c2star[subject[i]]*exp(-ke2[subject[i]]*time[i]))
conc[i] ~ dnorm(pred[i],tau)
}

for(i in 1:nsubject) {
ke1[i] ~ dlnorm(ke[1],kemtau[1])
ke2[i] ~ dlnorm(ke[2],kemtau[2])
c1star[i] ~ dlnorm(cm[1],ctau[1])
c2star[i] ~ dlnorm(cm[2],ctau[2])
}
for (i in 1:2) {
kem[i] ~ dunif(-10,10)
kemtau[i] <- 1/pow(kesd[i],2)
kesd[i] ~ dunif(0,10)
Ke[i] <- exp(ke[i])
cm[i] ~ dunif(-10,20)
ctau[i] <- 1/pow(csd[i],2)
csd[i] ~ dunif(0,100)
C[i] <- exp(cm[i])
}
ke <- sort(kem)
}

parameters <- c('Ke','ke1','ke2','c1star','c2star','C')
inits <- function() {
list(ke1 = rnorm(6,0.13,.03),
ke2= rnorm(6,0.0180,.005),
kem= log(c(rnorm(1,0.13,.03),rnorm(1,0.0180,.005))),
kesd =runif(2,.1,.4),
cm = log(rnorm(2,25,5)),
c1star=rnorm(6,25,3),
c2star=rnorm(6,25,3)
)}
jagsfit1 <- jags(datain, model=model1,
inits=inits,
parameters=parameters,progress.bar="gui",
n.iter=10000,
n.chains=4,n.thin=10)
jagsfit1
#plot(jagsfit1)
#traceplot(jagsfit1)

Time <- seq(0,max(Cefamandole$Time))
la1 <- sapply(1:nrow(jagsfit1$BUGSoutput$sims.matrix),
function(i) {
C1 <- jagsfit1$BUGSoutput$sims.matrix[i,'C[1]']
C2 <- jagsfit1$BUGSoutput$sims.matrix[i,'C[2]']
k1 <- jagsfit1$BUGSoutput$sims.matrix[i,'Ke[1]']
k2 <- jagsfit1$BUGSoutput$sims.matrix[i,'Ke[2]']
y=C1*exp(-k1*Time)+C2*exp(-k2*Time)
})
res1 <- data.frame(
Time=Time,
Conc=apply(la1,1,mean),
C05=apply(la1,1,FUN=function(x) quantile(x,0.05)),
C95=apply(la1,1,FUN=function(x) quantile(x,0.95)))
png('cef1.png')
ggplot(res1, aes(x=Time)) +
geom_line(aes(y=Conc))+
scale_y_log10('cefamandole (mcg/ml)')+
geom_line(aes(y=conc,x=Time,col=Subject),alpha=1,data=Cefamandole)+
geom_ribbon(aes(ymin=C05, ymax=C95),alpha=.2)+
theme(legend.position='bottom')
dev.off()
#########
#
# model 2: proportional error
#
#########
model2 <- function() {
tau <- 1/pow(sigma,2)
sigma ~ dunif(0,100)
for (i in 1:n) {
pred[i] <- log(
(ke1[subject[i]]<ke2[subject[i]])*
(c1star[subject[i]]*exp(-ke1[subject[i]]*time[i])+
c2star[subject[i]]*exp(-ke2[subject[i]]*time[i]))
+.001*(ke1[subject[i]]>ke2[subject[i]]))
conc[i] ~ dlnorm(pred[i],tau)
}


for(i in 1:nsubject) {
ke1[i] ~ dlnorm(ke[1],kemtau[1])
ke2[i] ~ dlnorm(ke[2],kemtau[2])
c1star[i] ~ dlnorm(cm[1],ctau[1])
c2star[i] ~ dlnorm(cm[2],ctau[2])
}
for (i in 1:2) {
kem[i] ~ dunif(-10,10)
kemtau[i] <- 1/pow(kesd[i],2)
kesd[i] ~ dunif(0,10)
Ke[i] <- exp(ke[i])
cm[i] ~ dunif(-10,20)
ctau[i] <- 1/pow(csd[i],2)
csd[i] ~ dunif(0,100)
C[i] <- exp(cm[i])
}
ke <- sort(kem)

}

parameters <- c('Ke','ke1','ke2','c1star','c2star','C')
inits <- function() {
list(ke1 = rnorm(6,0.13,.03),
ke2= rnorm(6,0.0180,.005),
kem= log(c(rnorm(1,0.13,.03),rnorm(1,0.0180,.005))),
kesd =runif(2,.1,.4),
cm = log(rnorm(2,25,5)),
c1star=rnorm(6,25,3),
c2star=rnorm(6,25,3)
)}
jagsfit2 <- jags(datain, model=model2,
inits=inits,
parameters=parameters,progress.bar="gui",
n.iter=10000,
n.chains=4,n.thin=10)
jagsfit2
la2 <- sapply(1:nrow(jagsfit2$BUGSoutput$sims.matrix),
function(i) {
C1 <- jagsfit2$BUGSoutput$sims.matrix[i,'C[1]']
C2 <- jagsfit2$BUGSoutput$sims.matrix[i,'C[2]']
k1 <- jagsfit2$BUGSoutput$sims.matrix[i,'Ke[1]']
k2 <- jagsfit2$BUGSoutput$sims.matrix[i,'Ke[2]']
y=C1*exp(-k1*Time)+C2*exp(-k2*Time)
})

res2 <- data.frame(
Time=Time,
Conc=apply(la2,1,mean),
C05=apply(la2,1,FUN=function(x) quantile(x,0.05)),
C95=apply(la2,1,FUN=function(x) quantile(x,0.95)))

png('cef2.png')
ggplot(res2, aes(x=Time)) +
geom_line(aes(y=Conc))+
scale_y_log10('cefamandole (mcg/ml)')+
geom_line(aes(y=conc,x=Time,col=Subject),alpha=1,data=Cefamandole)+
geom_ribbon(aes(ymin=C05, ymax=C95),alpha=.2)+
theme(legend.position='bottom')
dev.off()

Sunday, April 5, 2015

Hierarchical Two Compartimental PK Model

In this post I am running the Theoph dataset from MEMSS (Data sets and sample analyses from Pinheiro and Bates, "Mixed-effects Models in S and S-PLUS" (Springer, 2000)) in a JAGS model. To quote the MEMSS manual:
'Boeckmann, Sheiner and Beal (1994) report data from a study by Dr. Robert Upton of the kinetics of the anti-asthmatic drug theophylline. Twelve subjects were given oral doses of theophylline then serum concentrations were measured at 11 time points over the next 25 hours.
These data are analyzed in Davidian and Giltinan (1995) and Pinheiro and Bates (2000) using a two-compartment open pharmacokinetic model, for which a self-starting model function, SSfol, is available.'

Model

The model which is used in the example for the data is SSfol. Its value is:

Dose * exp(lKe+lKa-lCl) * (exp(-exp(lKe)*input) - exp(-exp(lKa)*input))
    / (exp(lKa) - exp(lKe))

With input the time variable. This looks very complex, but there seems to be a trick here. What is estimated is not Ke (elimination rate) but its natural logarithm. The same is true for Ka and CL. This is probably done to keep the parameters positive. So in terms of ordinary parameters:

Dose * (Ke+Ka)/CL * (exp(-Ke*input) - exp(-Ka*input)) / (Ka - Ke)

This looks much more simple.

Estimates using nls

The example code uses nls() to fit the model. Below the model is wrapped in an apply. In addition, the exponent is taken.

library(MEMSS)

la <- lapply(levels(Theoph$Subject), function(iSubject)

{Theoph.i <- subset(Theoph, Subject == iSubject)

fm1 <- nls(conc ~ SSfol(Dose, Time, lKe, lKa, lCl),

data = Theoph.i)

ee <- exp(coef(fm1))

dd <- data.frame(Ke=ee[1],Ka=ee[2],CL=ee[3])

row.names(dd) <- iSubject

}

)

nlsres <- do.call(rbind,la)

nlsres

Ke Ka CL

A 0.05395450 1.7774170 0.01992348

B 0.07396616 0.6955019 0.03244300

C 0.09812331 3.8490404 0.05724601

D 0.10557584 0.8328979 0.04199695

E 0.10166133 1.9426573 0.04476553

F 0.08142497 2.4535654 0.03955890

G 0.08746697 1.1714752 0.03739991

H 0.08843514 1.4715045 0.04360427

I 0.09952647 1.1637219 0.05113727

J 0.10224639 0.6797358 0.05159475

K 0.09195675 1.3755229 0.04646244

Model in JAGS

In JAGS there are more direct mechanisms to keeping parameters positive. However, there is one problem, an alternative set of parameters for subject A, Ka=0.053, Ke=1.77 does equally well. Combining MCMC samples from these two solutions just results in nonsense. To avoid this, I have placed the whole scaling in a parameter C0star, which is positive. In subsequent code individual CL and its hyper parameter are estimated from C0star. To keep this as a directed acyclic graph (DAG) additional variable z and parameters CLr are introduced. In addition, dose has been made a subject level variable, rather than associated with time points.

model1 <- function() {
tau <- 1/pow(sigma,2)
sigma ~ dunif(0,100)
for (i in 1:n) {
pred[i] <- c0star[subject[i]]*
(exp(-ke[subject[i]]*time[i])-exp(-ka[subject[i]]*time[i]))
conc[i] ~ dnorm(pred[i],tau)
}
for(i in 1:nsubject) {
ka[i] ~ dlnorm(kam,katau)
ke[i] ~ dlnorm(kem,ketau)
c0star[i] ~ dunif(0,50)
# c0star=dose*((ke*ka)/CL)/(ka-ke)
CL[i] <- Dose[i]*ka[i]*ke[i]/((ka[i]-ke[i])*c0star[i])
CLr[i] <- CL[i]-CLm
z[i] ~ dnorm(CLr[i],CLtau)
}
kam ~ dunif(-10,10)
katau <- 1/pow(kasd,2)
kasd ~ dunif(0,100)
Kam <- exp(kam)
# ta0_5 <- log(2)/Kam

kem ~ dunif(-10,10)
ketau <- 1/pow(kesd,2)
kesd ~ dunif(0,100)
Kem <- exp(kem)
# te0_5 <- log(2)/Kem

CLm ~ dunif(0,100)
CLtau <- 1/pow(CLsd,2)
CLsd ~dunif(0,100)
}

The usual code for running the model. I have noticed quite some sensitivity to inits while writing this post.

library(R2jags)

datain <- list(

time=Theoph$Time,

conc=Theoph$conc,

n=nrow(Theoph),

subject =c(1:nlevels(Theoph$Subject))[Theoph$Subject],

nsubject = nlevels(Theoph$Subject),

z=rep(0, nlevels(Theoph$Subject)),

Dose=unique(subset(Theoph,, c(Subject,Dose)))$Dose

)

parameters <- c('Kem','Kam','ke','ka','CL','CLm')

inits <- function() {

list(kam = rnorm(1,log(1),.3),

ka= rnorm(12,1,.2),

kem= rnorm(1,log(.08),.01),

ke= rnorm(12,.08,.01),

c0star = runif(12,9,11)

)}

jagsfit <- jags(datain, model=model1,

inits=inits,

parameters=parameters,progress.bar="gui",

n.iter=10000,

n.chains=4,n.thin=2)

Results

It seems the estimated parameters are close to the parameters from nls(). This is for this blog post sufficient. The data has quite some shrinkage, almost all subjects have Ke close 0.86. CL and Ka also have shrinkage, but much less than Ke.
Inference for Bugs model at "C:/Users/.../modeld8059bf37f0.txt", fit using jags,
4 chains, each with 10000 iterations (first 5000 discarded), n.thin = 2
n.sims = 10000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
CL[1] 0.026 0.002 0.022 0.025 0.026 0.028 0.030 1.003 1100
CL[2] 0.035 0.002 0.031 0.034 0.035 0.037 0.039 1.001 4200
CL[3] 0.051 0.004 0.044 0.048 0.051 0.053 0.058 1.002 2300
CL[4] 0.038 0.002 0.034 0.037 0.038 0.040 0.044 1.006 460
CL[5] 0.041 0.003 0.036 0.039 0.041 0.043 0.047 1.005 620
CL[6] 0.041 0.003 0.035 0.039 0.041 0.042 0.046 1.002 2200
CL[7] 0.037 0.003 0.033 0.036 0.037 0.039 0.043 1.003 990
CL[8] 0.043 0.003 0.038 0.041 0.043 0.044 0.048 1.004 960
CL[9] 0.046 0.004 0.039 0.044 0.046 0.048 0.053 1.002 1600
CL[10] 0.046 0.003 0.040 0.044 0.046 0.049 0.053 1.002 1600
CL[11] 0.044 0.003 0.038 0.042 0.044 0.046 0.051 1.002 2000
CL[12] 0.033 0.002 0.029 0.031 0.033 0.034 0.038 1.007 430
CLm 0.040 0.003 0.035 0.038 0.040 0.042 0.046 1.004 1200
Kam 1.673 0.450 0.998 1.386 1.616 1.885 2.700 1.001 6400
Kem 0.086 0.005 0.076 0.082 0.085 0.089 0.096 1.010 420
ka[1] 1.482 0.205 1.127 1.338 1.466 1.607 1.934 1.001 9300
ka[2] 0.656 0.089 0.507 0.593 0.649 0.710 0.854 1.006 510
ka[3] 3.964 0.860 2.657 3.372 3.836 4.407 5.973 1.001 5700
ka[4] 0.954 0.125 0.730 0.869 0.948 1.032 1.219 1.009 310
ka[5] 2.110 0.325 1.555 1.882 2.080 2.302 2.827 1.003 1100
ka[6] 2.385 0.441 1.671 2.073 2.336 2.637 3.411 1.002 2200
ka[7] 1.208 0.171 0.910 1.092 1.195 1.309 1.584 1.002 2000
ka[8] 1.502 0.183 1.171 1.376 1.493 1.615 1.890 1.003 1100
ka[9] 1.287 0.245 0.878 1.112 1.261 1.429 1.834 1.002 1600
ka[10] 0.787 0.139 0.551 0.690 0.774 0.871 1.094 1.004 910
ka[11] 1.446 0.248 1.021 1.274 1.422 1.593 2.006 1.002 2400
ka[12] 8.409 5.846 4.277 5.818 7.078 8.998 20.996 1.001 7100
ke[1] 0.080 0.008 0.061 0.075 0.081 0.085 0.093 1.004 990
ke[2] 0.084 0.007 0.069 0.080 0.084 0.089 0.099 1.003 1100
ke[3] 0.085 0.007 0.072 0.081 0.085 0.089 0.101 1.002 2500
ke[4] 0.089 0.009 0.076 0.083 0.088 0.094 0.110 1.017 210
ke[5] 0.089 0.008 0.076 0.083 0.087 0.093 0.109 1.007 440
ke[6] 0.085 0.007 0.071 0.081 0.085 0.089 0.100 1.003 1300
ke[7] 0.086 0.007 0.073 0.082 0.086 0.090 0.104 1.006 530
ke[8] 0.086 0.007 0.073 0.081 0.085 0.090 0.101 1.008 420
ke[9] 0.086 0.008 0.071 0.081 0.085 0.090 0.103 1.005 970
ke[10] 0.086 0.008 0.071 0.081 0.085 0.090 0.104 1.009 480
ke[11] 0.086 0.008 0.072 0.081 0.085 0.090 0.103 1.005 630
ke[12] 0.088 0.008 0.075 0.083 0.087 0.092 0.106 1.008 340
deviance 200.433 9.015 184.713 193.957 199.825 206.049 220.328 1.002 2200

For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)
pD = 40.6 and DIC = 241.0
DIC is an estimate of expected predictive error (lower deviance is better).