|
Abstracts &
Slides
|
 |
February 7, 2012 |
|
|
|
"Generalized Procrustes Analysis with an Application"
by Siddik Keskan, Yuzuncu Yil University
In this talk, we present a dimension reduction method
called Generalized Procrustes Analysis (GPA), which is often
used in sensory analysis and shape analysis. When, for
example, similar food products are evaluated by multiple
experts/judges according to a number of attributes for
quality or consumer preferences, this method can be applied
to examine the relationships among products, attributes, and
experts. GPA is an exploratory technique that enables one to
represent graphically the results from multiple dimensions
onto two-dimensional maps. The GPA algorithm involves three
steps: translation, rotation/reflection, and isotropic
scaling. We will describe these steps and their
corresponding mathematical formulations, after a brief
general introduction to the method. An application to
agriculture research will be discussed to illustrate the
practical use of GPA.
Talk Slides
|
|
 |
January 31, 2012 |
|
|
|
"Heterogeneity Bias in Logistic and Cox Regression"
by Tim Ramsay, Ottawa
Hospital Research Institute & University of Ottawa
It is a surprisingly little-known fact that omitting
important predictors from both the logistic regression and
the Cox proportional hazard models leads to biased effect
estimates. Importantly, this is true when the omitted
predictors are unassociated with the exposure of interest
and it is therefore completely unrelated to confounding.
While some statisticians argue that it is wrong to refer to
this phenomenon as bias, it will be argued that for
practical purposes that is the right way to think of it.
This presentation will present this phenomenon and discuss
its relevance to the analysis and interpretation of data
from randomized clinical trials. Results from a systematic
review of published RCTs will be presented to suggest that
the implications may be important.
Talk Slides
|
|
|
 |
January 24, 2012 |
|
|
|
"Systematic Random Sampling after Serpentine Ordering to Select
Neighbourhoods in Toronto: the Neighbourhood Effects on
Health and Well-Being (NEHW) Project"
by Rosane Nisenbaum,
St. Michael’s Hospital
The Neighbourhood Effects on Health and Well-Being
(NEHW) Project in Toronto was designed to identify
neighbourhood, in addition to individual, factors associated
with mental health outcomes. Data collection started in 2009
and ended in 2011, and data analysis is currently underway.
A three-stage sampling design was considered: 50 of 140
neighbourhoods were selected at Stage 1, 2 census tracts
within each neighbourhood were selected at Stage 2; and 30
households within each census tract were selected at Stage
3. In this talk, we focus on describing the selection of the
neighbourhoods using systematic random sampling after
serpentine ordering to provide implicit geographic
stratification of neighbourhoods (Leslie Kish 1995, Survey
Sampling; Jeff Geuder, US Department of Agriculture,
Statistical Reporting Service, SF&SRB Report No79, February
1984 “Paper Stratification in SRS Area Sampling Frames”).
Talk Slides
Recommended
Reading
|
|
|
|
|
|
|
|
Fall 2011 Seminars |
|
|
|
|
|
|
 |
December 6, 2011 |
|
|
|
"An Introduction to the Kalman Filter and its Applications"
by Ian Moore,
Quatrametrics Inc.
In this lecture, we present a recursive solution to the
discrete-data linear filtering problem, which was developed
by Kalman in the early 1960s to address trajectory
estimation problems that are commonly found in aerospace
engineering. This recursive system, of linear algebraic
equations, is popularly known as the Kalman filter, which is
used to remove noise or error from repeated measurements
taken over time. The Kalman filter is optimal, if we make
the assumption that the error (the part being removed),
within an arbitrary set of discrete input measurements, is
Gaussian, when minimizing the mean-squared error between the
raw measurements and the true underlying values. The Kalman
filter is used in a wide range of applications in various
fields of engineering. It is also used in economics, as it
can be applied to the estimation of structural macroeconomic
models. Using several examples, we demonstrate its
usefulness to health economics and health policy research.
Talk Slides
|
|
|
 |
November 29, 2011 |
|
|
|
"Confidence Interval Estimation for Continuous Outcomes in Cluster
Randomization Trials"
by Julia Taleban,
Samuel Lunenfeld Research Institute
Cluster randomization trials are experiments where
intact social units (e.g. hospitals, schools, communities,
and families) are randomized to the arms of the trial rather
than individuals. The popularity of this design among health
researchers can be explained by many reasons, including
reduced contamination of treatment effects and convenience.
However, the advantages of cluster randomization trials do
not come without a price. Due to the dependence of
individuals within a cluster, cluster randomization trials
suffer reduced statistical efficiency and often a complex
analysis of study outcomes. The primary purpose of this work
is to propose new confidence intervals for effect measures
commonly of interest for continuous outcomes arising from
cluster randomization trials. Specifically, we construct new
confidence intervals for the difference between two normal
means, the difference between two lognormal means, and the
exceedance probability. The proposed confidence intervals,
which use the method of variance estimates recovery (MOVER),
do not make certain assumptions that existing procedures
make on the data. For instance, symmetry is not forced when
the sampling distribution of the parameter is skewed and the
assumption of homoscedasticity is not made. Simulation
studies are used to investigate the small sample properties
of the MOVER as compared with existing procedures.
Unbalanced cluster sizes are simulated, with an average
range of 50 to 200 individuals per cluster and 6 to 24
clusters per arm. The effects of various degrees of
dependence between individuals within the same cluster are
also investigated. When comparing the empirical coverage,
tail errors, and median widths of confidence interval
procedures, the MOVER has coverage close to the nominal,
relatively balanced tail errors, and narrow widths as
compared to existing procedure for the majority of the
parameter combinations investigated. Existing data from
cluster randomization trials are then used to illustrate
each of the methods.
Talk Slides
|
|
|
 |
November 22, 2011 |
|
|
|
"The Cumulative Effects of Time-varying Covariates in Survival
Analysis: Flexible Modeling and Applications"
by Michal
Abrahamowicz, McGill University
Many exposures and risk factors evaluated in prospective
studies of survival (or, more generally, of time to a
specific event) show important variation over time. We focus
on 2 challenges that need to be addressed to ensure accurate
testing and modeling of the potential cumulative effects of
such time-varying exposures. (i) Firstly, while the
etiological relevance of an exposure may vary considerably
depending on how long ago it occurred, little is known about
relative importance of exposures that occurred in different
intervals in the past. (ii) Secondly, the form of the
dose-response relationship, describing the impact of
increasing exposure intensity at a fixed point in time on
the hazard, is typically unknown. For example, (i)
cardiovascular (CVD) risk factors, such as serum cholesterol
or blood pressure, change considerably over the person’s
lifetime, with different individual showing different
patterns of change, and there is a continuing debate about
the relative impact of their recent vs much earlier values
on the current CVD risk. On the other hand, (ii) several CVD
risk factors have non-linear relationships with the logit or
log-hazard of CVD risk, and the form of the dose-response
curve varies depending on the risk factor. Whereas there is
vast statistical literature on flexible modeling of
dose-response curves, the only published method for flexible
modeling of the function assigning relative importance
weights to past exposures is limited to logistic regression
analyses of case-control studies. We recently proposed a
flexible spline-based model to estimate the weighed
cumulative exposure (WCE) effects in the framework of a
generalized Cox’s model for censored survival data. The
value of the WCE at time τ, is modelled as a function of the
past exposure history, described by the time-dependent
exposure intensity x(t), for 0<t< τ: WCE(τ|x(t), t< τ) =
∑w(τ –t)*[x(t)], where w(τ –t) is modeled using
low-dimension cubic regression splines and assigns
importance weights to past exposures as the function of the
time elapsed since the exposure (τ –t). The estimated WCE is
then included as a time-dependent covariate in the Cox’s
proportional hazards model. To assess the accuracy of the
proposed estimates we rely on simulations and use our
versatile ‘permutational algorithm’ to generate event times
conditional on time-varying WCE. No currently available
method handles flexible modeling of both (i) weight function
for cumulative effects modeling, and (ii) non-linear dose
response curve. Thus, we have recently extended the above
model to: WCE(τ|x(t), t< τ) = ∑w(τ –t)*s[x(t)] , where
s[x(t)] represents a smooth dose-response curve describing
the relationship between exposure intensity (dose) at a
given point in time and the logarithm of the hazard. To
illustrate a real-life application, we use a large long-term
cohort study with repeated measures of blood pressure, to
re-assess their effects on CVD mortality and morbidity.
Talk Slides
Recommended Readings
I, II
|
|
|
 |
November 15, 2011 |
|
|
|
"The Case-Crossover Study Design: An Approach for Evaluating
Associations Between Daily Air Pollution Concentrations
and Hospitalization for Stroke"
by Paul
J. Villeneuve, Health Canada
Over the past decade, ambient air pollution has emerged
as a global risk factor for stroke. For the most part,
associations between air pollution and stroke have been
found in studies that have evaluated day-to-day changes in
air pollution concentrations rather than with long term
exposure. The case-crossover study design is a frequently
used approach to characterize the risk of adverse health
outcomes in relation to short-term changes in environmental
exposures. In this seminar, an overview of the
case-crossover design will be presented, and recent findings
from ongoing research evaluating the associations between
ambient air pollution and stroke in Edmonton will be
discussed. Examples of SAS programming techniques used to
derive the risk estimates will be reviewed, and presented in
class. Sample data and programs will also be provided so
participants can practice conducting case-crossover
analysis.
Talk Slides
Source Codes
SAS
& R,
and Data
|
|
|
 |
November 8, 2011 |
|
|
|
"Disease Mapping with Log Gaussian Cox Processes"
by Lennon Li, University of Toronto
In disease mapping, when target diseases have low
prevalences, the study usually covers a long time period to
accumulate sufficient cases. However, during this period,
numerous irregular changes in the census regions on which
population is reported may occur, which complicates
inferences. A new model was developed for the case when the
exact location of the cases is available, consisting of a
continuous random spatial surface and fixed effects for time
and ages of individuals. The process is modeled on a fine
grid, approximating the underlying continuous risk surface
with Gaussian Markov Random Field and Bayesian inference is
performed using integrated nested Laplace approximations.
Further, when the exact location of the cases is not known,
inference is complicated by the uncertainty of case
locations due to data aggregation on census regions for
confidentiality. Conventional modeling relies on the census
boundaries that are unrelated to the biological process
being modeled, and may result in stronger spatial dependence
in less populated regions which dominate the map. A new
model was developed consisting of a continuous random
spatial surface with aggregated responses and fixed
covariate effects on census region levels. The continuous
spatial surface was approximated by Markov random field,
greatly reduced the computational complexity. The process
was again modeled on a lattice of fine grid cells and
Bayesian inference was performed using Markov Chain Monte
Carlo with data augmentation.
Simulations studies
were carried out to assess performance of proposed model and
to compare with the conventional Besag-York-Molliè model as
well as model assuming exact locations are known. Receiver
operating characteristic curves and Mean Integrated Squared
Errors were used as measures of performance. The exact
location model is applied to clinical data on the location
of residence at the time of diagnosis of new Lupus cases in
Toronto for the 40 years to 2007 and the aggregated model is
applied to clinical data of syphilis cases in North
Carolina, for the 9 years to 2007, with the aims of finding
areas of abnormally high risk. Predicted risk and posterior
exceedance probabilities maps are produced and results were
interpreted.
Talk Slides
|
|
|
 |
October November 1, 2011 |
|
|
|
"Construction of Bivariate Distributions via Principal Components"
by Amparo Casanova, University of Toronto
The diagonal expansion of a bivariate distribution
(Lancaster, 1958) has been used as a tool to construct
bivariate distributions; this method has been generalized
using principal dimensions of random variables (Cuadras
2002). Sufficient and necessary conditions are given for
uniform, exponential, logistic and Pareto marginals in the
one and two-dimensional cases. The corresponding copulas are
obtained.
Talk Slides
Recommended Readings
I, II,
III
Additional Readings
IV,
V
|
|
|
 |
October 25, 2011 |
|
|
|
"Determining Optimal Sample Sizes for Multi-stage Randomized Clinical
Trials from an Industry Perspective Using Value of
Information Methods"
by Maggie Chen, University of Toronto
A model is proposed for the expected total profit that
includes consideration of per-patient profit, disease
incidence, time horizon, trial duration, market share, and
the relationship between the trial results and probability
of regulatory approval. The proposed VOI method includes
multi-stage adaptive designs with a solution for two-stage
design. With an example, it has demonstrated that
significant increases in expected net gain can be achieved
by using multi-stage design and a smaller expected total
sample size and less cost will be required.
Talk Slides
|
|
|
 |
October 18, 2011 |
|
|
|
"Sample Weights in Bayesian Regression: A Case Study"
by Tim Guimond,
Anky Lai,
Josh Murray,
Kristen O'Brien
& Case Studies Team, University of Toronto
This presentation will discuss the process of
participating in the SSC case studies as well as the results
from an analysis of the income gap between young Canadian
men and women from 1996 to 2008. A Bayesian approach was
used as a sophisticated attempt at dealing with sampling
weights in the prediction of income. We examined an
unweighted model, a continuously weighted model and a model
subdividing sample weights into classes. The gender gap in
income between young Canadians is expanding over time and,
controlling for other covariates, men earn significantly more
than women. We will discuss the ideal approach to completing
an SSC case study while highlighting the dilemmas that are
encountered along the way.
Talk
Slides:
Part I &
Part II
|
|
|
 |
October 11, 2011 |
|
|
|
"Efficient Analysis of Case-Control Studies with Sample Weights (with
Application to U.S. Kidney Cancer Study)"
by Vicky Landsman,
St. Michael's Hospital & NIH/NCI
I will talk about analysis of case-control studies from
complex sampling designs. For these studies, the sample
selection mechanism is usually informative, which requires
incorporating sample weights in the analysis to obtain
consistent estimates of the population parameters. The
conventional weighted estimators, obtained by solving
weighted estimating equations, are known to be inefficient
when the weights are highly variable as is typical for
case-control designs. I will present an alternative
semi-parametric weighted estimator, obtained by solving
weighted estimating equations using the model-adjusted
rather than the original sample weights. The adjustment of
the weights helps to reduce their variability and, as a
result, improve the efficiency and reduce the bias of the
estimators that use the adjusted weights. I will discuss
benefits and limitations of the proposed estimator
emphasizing efficiency and robustness. I will show some
interesting results from the simulation study and the
application of the methods to the U.S. Kidney Cancer
Case-Control Study which motivated this work.
Talk
Slides
Recommended Readings
I, II
|
|
|
 |
October 4, 2011 |
|
|
|
"Determinants of the Presence and Volume of Brown Fat in Humans"
by
Joanne Quan,
Lutong Zhou & Case Studies Team, University of
Toronto
Brown fat is typically found in hibernating animals but
recent technological advances have detected the presence of
brown fat in humans as well. The goal of this project is to
identify the factors determining the existence and the
volume of brown fat in humans. The data contains 4842
observations from cancer patients, of which approximately 6%
have brown fat. Generalized linear regression models and the
Box-Cox transformation were used to build a 4-stage model to
investigate the relationship between the covariates and
presence and volume of brown fat. The results showed that
the age, sex and external temperature were significant
predictors; women have higher volumes of brown fat than men,
and brown fat volume decreases with increasing age and lower
external temperature.
Talk
Slides
|
|
|
 |
September 27, 2011 |
|
|
|
"Biomedical Applications of Runs and Patterns"
by
Wendy Lou, University of Toronto
Sequences of categorical outcomes arise frequently in
biomedical research, representing, for example, DNA segments
or outcomes of healthcare evaluations. They are typically
analyzed by defining on them problem-specific statistics
involving runs and patterns. The distributions of such
statistics are often unknown except in the simplest cases,
and the simplifying assumption of iid outcomes is in many
cases not realistic. The method of finite Markov chain
imbedding is an effective tool for deriving and studying the
distributions of complex statistics defined on categorical
sequences. In this talk, a review of the approach will be
presented, and its usage will be illustrated via selected
practical applications drawn from studies of HIV patient
care and DNA tandem repeats, among others.
|
|
| |
|
|
|
|