| |
Abstracts &
Slides
|
 |
April 1 and
8,
2008 [3-5pm] (HS 108) |
|
| |
"Data
Integration Methods in the Life Sciences"
by
Joseph Beyene
and Jemila Hamid,
The Hospital for Sick Children
The importance of data integration has been widely
recognized in the health sciences as a critical component to
evidence-based and well-informed decisions in health-care
delivery. Scientists need to be able to access, analyze and
interpret a wide range of information in order to answer
important clinical questions, understand biological systems,
elucidate the impact of drug interventions on diseases etc.,
and this requires data to be integrated. In recent years,
there has been an exponential growth in the amount of life
science data generated by high throughput experiments (e.g.,
microarray gene expression data, Mass spectroscopy protein
data, sequence variations etc.), each data type with its own
level of complexity and varying quality. One of the research
focuses of our group is the development and application of
methods for ‘data integration’.
There are two sessions for our presentation.
Part I (April 1, Joseph Beyene):
In this first part, we will provide a conceptual framework
of key data integration tasks and methods and will focus on
meta-analytic approaches that are widely used for
integrating similar data types, primarily in clinical
medicine. We will describe effect measures typically used
with different outcome variables, discuss about fixed versus
random effect models and modeling assumptions. Contentious
issues such as heterogeneity and publication bias will be
discussed briefly.
Part II (April 8, Jemila Hamid):
In the second part of the talk, we will talk about kernel
based methods for integrating heterogeneous data. We will
briefly discuss kernel matrices and their application in
cluster analysis (unsupervised learning) and discriminant
analysis (supervised learning). We will focus on Fisher
Discriminant Analysis (FDA) followed by its non linear
extension, Kernel Fisher Discriminant Analysis (KFDA).
We will then present our on going work for combining
heterogeneous data using weighted KFDA. We will illustrate
our results using the well known Fisher’s iris data. We will
also show some preliminary results using breast cancer
microarray and clinical data.
Talk Slide
Part I,
Part II
|
|
|
 |
March 25,
2008 [4-5pm] (HS 108) |
|
| |
"Introduction
to Disease Mapping Using Bayesian Models"
by
Virgilio
Gόmez-Rubio, Imperial College London, UK
Bayesian models have been used very successfully in recent
years to study the risk of mortality in many different
contexts. In my talk I will introduce different methods of
estimating the mortality risk. Starting with the
Standardised Morality Ratio, I will discuss a series of
Empirical Bayes (EB) estimators that have been proposed in
recent years. EB estimators are based on Bayesian
hierarchical models where the hiperparameters are estimated
from the data and the posterior distribution of the
parameters of interest derived from there.
Full Bayesian approaches of some of these models will be
discussed as well. In addition, the model by Besag, York and
Mollié will be fully described. This model can account for a
spatial structure in the data as well as relevant
covariates.
All these methods will be illustrated using real data on
lung cancer males mortality in Toronto at the tract level.
Talk Slides,
R-Code/Files,
Assignment
|
|
|
 |
March 18,
2008 [3-5pm] (HS 108) |
|
| |
"Spatial
Point Processes"
by
Patrick Brown, Cancer Care Ontario
Locations of disease incidence can be thought of as random
points in the plane. One important question is whether these
points are located independently of each other, or if they
tend to cluster together. As cases are more likely in areas
of high population, measuring clustering needs to take
population density into account. The Inhomogeneous
K-function is a tool for assessing clustering in spatial
point processes, and this lecture will explain how the
K-function is related to mathematical properties of the
underlying spatial point process.
Talk Slides
|
|
|
 |
March 4 and
11,
2008 [3-5pm] (HS 108) |
|
| |
"Time
Dependent Covariates in Parametric Survival Models"
by Sandra Gardner, Sunnybrook Health Sciences Centre
The semi-parametric Cox regression model for survival data
can incorporate time varying covariates. There are some
examples in the literature where time varying covariates are
incorporated into parametric survival models (for example,
Petersen T. Fitting Parametric Survival Models with
Time-Dependent Covariates. Applied Statistics-Journal of the
Royal Statistical Society Series C, 35 (3): 281-288 1986).
In Part I, we will compare the Cox model to the parametric
survival model when we have a proportional hazards model or
a non proportional hazards model. Examples using sample or
simulated data will be presented along with the
corresponding SAS code.
In Part II, we will compare and contrast the Poisson
regression model which can also incorporate time varying
covariates. Other methods for analyzing survival data with
time-dependent covariates found in the literature will be
discussed.
Talk Slides Part I,
Part II
|
|
|
 |
February 26,
2008 [3-5pm] (HS 108) |
|
| |
"All Is Not Well in the House of Statistics: A
Competing Approach to the Analysis of Genetic Association"
by Lisa Strug, Hospital
for Sick Children
The "multiple testing problem" currently bedevils genetic
association studies, especially for genome wide studies
where often >500,000 Single Nucleotide Polymorphism tests
are conducted across the genome. Briefly stated, this
problem arises when we perform more than one statistical
test, which leads to increased probabilities of committing
at least one type I error. The conventional solution to
this problem relies on the classical Neyman-Pearson
statistical paradigm, since that is the paradigm used to
analyze the data for association, and involves adjusting
one's error probabilities. This adjustment is, however,
problematic because in the process of doing that, one is
also adjusting one's measure of evidence. Investigators
have actually become wary of looking at their data, for fear
of having to adjust the strength of the evidence they
observed at a given locus on the genome every time they
conduct an additional test.
The evidential paradigm uses the likelihood ratio (as
opposed to a p-value) as the measure of evidence for
association, and provides new, alternatively defined error
probabilities (analogous to Type I and Type II error rates),
i.e., probabilities of being misled. We have shown how this
paradigm separates or decouples the two concepts of error
probabilities and strength of the evidence. Here we apply
the evidential paradigm to genetic association studies and
the associated multiple testing problem. We advocate using
the likelihood ratio as the sole measure of the strength of
evidence; we then derive the corresponding probabilities of
being misled by the data under different multiple-testing
scenarios.
We distinguish two situations: performing multiple tests of
a single hypothesis, vs. performing a single test of
multiple hypotheses. For the first situation the
probability of being misled remains small regardless of the
number of times one tests the single hypothesis, as we
show. For the second situation, we provide a rigorous
argument outlining how replication samples themselves
(analyzed in conjunction with the original sample) provide
appropriate adjustments for testing multiple hypotheses on a
data set.
Talk Slides
Recommended Readings
I,
II,
III
|
|
|
 |
February 12,
2008 [3-5pm] (HS 108) |
|
| |
"Competing
Risks Analysis"
by Melania
Pintilie, Princess Margaret Hospital
In the time to event analysis there is the possibility to
observe more than one type of event. A competing risks
situation appears when the observation of the event of
interest is hindered by the occurrence of another type of
event. In the presence of competing risks the probability of
the event of interest cannot be estimated using the usual
product-limit (Kaplan-Meier) method. Kalbfleisch and
Prentice introduced a non-parametric method to estimate the
probability of the event of interest, referred as the
cumulative incidence function. To facilitate the
understanding of these two methods the estimates using the
cumulative incidence function will be compared with the
estimates obtained from Kaplan-Meier method in theoretical
framework as well as through examples. There are two types
of hazard that can be modeled, each with its own
interpretation. Cox proportional hazards model can be
applied for one of the hazards while the second type is
modeled using a partial likelihood introduced by Fine and
Grey. Although some theoretical details will be given, this
talk will focus on applied issues. Examples will be shown,
mostly drawn from cancer research. The methodology can
easily be extended to other areas where competing risks are
present. In the second part, the use of the specific
R-package for competing risk will be illustrated.
Talk Slides
Part I,
Part II
|
|
|
 |
January 29
and February 5,
2008 [3-5pm] (HSB 790) |
|
| |
"Longitudinal Data Analyses of Cohorts Created
Through Record Linkage to Canadian Mortality and Cancer
Databases"
by
Paul Villeneuve, Health
Canada
This presentation will provide an overview of two recent
studies that have made use of Statistics Canada’s
capabilities to link administrative data to national
mortality, and cancer incidence data. A description of the
methods used Statistics Canada to conduct record linkage
will be given. Thereafter, I will describe methods of
longitudinal data analyses that were applied to evaluate the
relationship between long-term exposure to radon and lung
cancer in a cohort of Newfoundland fluorspar miners. These
methods include the estimation of person-years of follow-up,
internal and external cohort analyses, and the evaluation of
modifiers of radon related lung cancer risk, including
cigarette smoking. The second study to be discussed is a
cohort study of transplant patients identified from CIHI's
Canadian Organ Replacement Registry database. This study
population will be described, and findings from preliminary
analyses of this cohort will be presented. These analyses
have examined the risk of developing cancer among patients
who received kidney transplants, with consideration of
dialysis as a time-dependent risk factor. This presentation
will be followed-up with an in-class computer lab session on
February 5. In the computer lab session on February 5, a
more thorough review of the SAS programs that were used to
perform analyses of the cohorts will be provided, and
students will be asked to perform similar analyses on
provided practice data sets.
Talk Slides
Part I,
Part II
Programs
and Data,
Exercise
|
|
|
 |
January 22,
2008 (HSB 108) |
|
| |
"Effects of Unemployment on Health"
by
Hideki Ariizumi, Wilfred Laurier University
I investigate the effects of unemployment on health status.
Due to the fact that unemployment and health are
simultaneously determined, a single-equation regression
method may not be appropriate. To address this issue, a
two-equation model is specified and jointly estimated. The
error terms are decomposed into two parts, one with
time-invariant and the other with time-variant component.
Both error components are allowed to be correlated between
the two equations. For the time-invariant error component, I
use the nonparametric random effects model. For the
time-variant component, I use the bivariate probit approach.
Furthermore, to help the identification of the causal effect
of unemployment on health, I use the instrumental variable
approach. The main finding is that, for prime-age male labor
market participants, unemployment has a negative and large
impact on self-reported health status, while it has no
effect on the objective health measure.
Talk Slides
|
|
|
 |
January 8, 2008
[3-5pm] (HSB 108) |
|
| |
"Swimming Without a Lifeguard: An Introduction to
Analyzing Complex Survey Data"
by
John Amrhein, SAS
Canada
Most analytical tools, including most SAS/STAT procedures,
assume that your data consist of independent observations of
a simple random sample from an infinite population.
Inferential statistical methods employed by these tools
allow you to make valid inferences about the population from
which that sample was drawn. However, in many surveys, data
does not represent a simple random sample of independent,
identically distributed observations selected from an
infinite population. Complex designs, from stratified to
multi-phase cluster designs, generate sampled observations
that are not independent, are not identically distributed,
and are not selected from an infinite population. To make
correct inferences, you must account for the complex design
by using the appropriate estimators for attributes and their
variances. It is also beneficial to account for the finite
nature of the population.
This lecture will introduce inferential methods that account
for common survey designs. One or two examples will be shown
using SAS/STAT procedures.
Talk Slides
SAS codes,
Data
|
|
|
|
|
|
|
|
Fall 2007 Seminars (HSB 100) |
|
|
|
|
|
|
 |
December 4, 2007 |
|
| |
"Introduction to Receiver Operating Characteristic (ROC)
Analysis in Medical Research--Part II"
by
Gina Lockwood,
Princess Margaret Hospital
ROC methodology, derived from statistical decision theory,
dates back to the early 1950s when it was developed to
summarize data from signal detection experiments. It is used
in medical applications to assess the performance of
diagnostic (or prognostic) tests which must choose which of
two conditions, unknown at the moment of decision, exists
(or will exist). These lectures will introduce the basic
concepts involved in evaluating the statistical properties
of a test, including sensitivity, specificity and the ROC
curve. Several methods for fitting, summarizing and
comparing ROC curves will be examined. Examples of
dichotomous, ordinal and continuous tests taken from
oncology studies will be presented.
Talk Slides
|
|
|
 |
November 27, 2007 |
|
| |
"Introduction to Receiver Operating Characteristic (ROC)
Analysis in Medical Research--Part I"
by
Gina Lockwood,
Princess Margaret Hospital
ROC methodology, derived from statistical decision theory,
dates back to the early 1950s when it was developed to
summarize data from signal detection experiments. It is used
in medical applications to assess the performance of
diagnostic (or prognostic) tests which must choose which of
two conditions, unknown at the moment of decision, exists
(or will exist). These lectures will introduce the basic
concepts involved in evaluating the statistical properties
of a test, including sensitivity, specificity and the ROC
curve. Several methods for fitting, summarizing and
comparing ROC curves will be examined. Examples of
dichotomous, ordinal and continuous tests taken from
oncology studies will be presented.
Talk Slides
|
|
|
 |
November 20, 2007
[3-5pm] |
|
| |
"Sequential Methods with Applications to Genetic Studies"
by
Laurent Briollais,
Samuel Lunenfeld Research Institute, Mt Sinai Hospital
The modern theory of sequential analysis stems from the work
of A. Wald in the U.S. and G. Barnard in Great Britain, who
participated in industrial advisory groups for war
production in the mid 1940s. Since then, sequential
approaches have been a natural way to proceed in many
experimentations, especially in the design of clinical
trials where interim analyses and the regulations of their
report have been formally described by the FDA (1988). In
the first part of this lecture, we will introduce some
general concepts about sequential methods. The second part
will describe more specifically some applications to genetic
studies, an emerging and promising field of application for
the this approach.
Talk Slides
|
|
|
 |
November 13, 2007 |
|
| |
"BUGS Research Day"
by
The Biostatistics Union of Gradate Students
Steve Fan:
Are Variable Section Methods Based on
Akaike Information Criterion Better?
Gerald Lebovic: Modeling Data with Ordinal Outcomes
Ahmed Hossain:
Nonparametric and Parametric
Estimation of Area under Receiver Operating Characteristic
curves (AUC) from continuously-distributed Data and
comparing two nonparametric AUCs
Talk Slides:
Steve,
Gerald,
Ahmed
|
|
|
 |
November 6, 2007 |
|
| |
"Generalized Linear Mixed Models for Categorical
Responses--Part II: Poisson Regression"
by
Rahim Moineddin,
University of Toronto & ICES
These two brief lectures will be introductory. The
extension of the generalized linear models to the class of
generalized linear mixed models to include random effects
will be discussed. Modeling binary and count data for
studies with hierarchal structure or repeated measurements
will be covered. Real data sets will be used for
illustration. Questions of interest include testing
for significance of covariates, interpretation of parameter
estimates, and details of SAS procedures NLMIXED and GLIMMIX.
Talk Slides
Assignment &
Dataset
Suggested Reading 1,
Reading 2
|
|
|
 |
October 30, 2007 |
|
| |
"Generalized Linear Mixed Models for Categorical
Responses--Part I: Generalized Linear Mixed Models"
by
Rahim Moineddin,
University of Toronto & ICES
These two brief lectures will be introductory. The
extension of the generalized linear models to the class of
generalized linear mixed models to include random effects
will be discussed. Modeling binary and count data for
studies with hierarchal structure or repeated measurements
will be covered. Real data sets will be used for
illustration. Questions of interest include testing
for significance of covariates, interpretation of parameter
estimates, and details of SAS procedures NLMIXED and GLIMMIX.
Talk Slides
|
|
|
 |
October 23, 2007 |
|
| |
"Spatial Statistics for Environmental Epidemiology"
by Patrick
Brown, Cancer Care Ontario
This lecture will be a brief introduction to some problems
in environmental Epidemiology, related to modelling counts
of diseases in different regions such as census tracts or
municipalities. Modelling data of this sort should allow
for incidence numbers to be affected by the population's age
and sex structure, measured covariates such as social
deprivation, and spatially varying random components.
Questions of interest include testing for significance of
covariates and detecting regions with abnormally high risk.
Talk Slides
Assignment
|
|
|
 |
October 16, 2007 |
|
| |
"Moving beyond the disease atlas model for public health
surveillance: The Nova Scotia Breast Screening Program"
by Mohamed Abdolell,
Dalhousie University
The primary features of public health surveillance systems
will be reviewed and how such a system is being established
within the context of the Nova Scotia Breast Screening
Program (NSBSP). The uniqueness of the NSBSP database is
that it captures the entire patient trajectory through both
the screening and diagnostic systems from the time a woman
participates in the NSBSP for her first mammogram and is now
on the cusp of doing so for all mammographically-screened
women in the province. The database is used for centralized
booking of women of both screening and diagnostic mammograms
and is maintained in real-time. The main objective of the
surveillance system was to implement an automated reporting
system that enables the generation of a fully formatted
NSBSP Annual Report, including various program-specific as
well as nationally-based performance indicators, in both
print and web formats in a matter of 1 hour, rather than the
current 9-12 months that is required to generate the report
manually. Consequently the report can be generated in
real-time and provides the basis for an on-demand
surveillance system. The system is implemented exclusively
using General Public License software including R, LaTeX,
Sweave, Perl, and several other helper scripting languages
on an open source Linux distribution (Ubuntu 7.04). The
advantage of such an implementation is that it can be finely
tuned to exact specifications of the NSBSP and can be easily
modified to accommodate the emerging surveillance needs of
the NSBSP with minimal added cost. A key aspect of
developing this system is to explore the feasibility of
integrating statistical process control and other
statistical methods into a fully automated surveillance
system. Such an open source solution is particularly
valuable in resource-poor jurisdictions enabling
surveillance at a reasonable cost.
Talk Slides
|
|
|
 |
October 9, 2007 |
|
| |
"How to Predict the Final Outcome of a Clinical Trial"
by
K.K. Gordon Lan, Univ of MDNJ and J&J, New Jersey
In the 1960s and 1970s, almost all clinical trials were
designed as fixed. That is, efficacy of a treatment would be
determined by the final data analysis. Despite the fixed
design, many NIH-sponsored clinical trials were
periodically reviewed by Policy Advisory Boards (they are
called Data Monitoring Committees nowadays). During interim
analyses, clinicians on the Board often asked the question:
If the current trend continues, what is the chance that we
will have a positive study?. We will discuss how to put this
question into a statistical framework and provide a simple
answer. The chance is called conditional power (CP) or
predictive power (PP). We will discuss the use of CP, along
with group sequential methods, for early termination of a
clinical trial. The concept of CP and PP can also be applied
to sample size estimation for a new study.
Talk Slides
|
|
|
 |
October 2, 2007 |
|
| |
"Computer Simulation: A Practical Tool in Health
Research--Part II"
by Paul N. Corey,
University of Toronto
Simulation has a long and proud history in biostatistics.
The introduction of digital computers to generate pseudo
random numbers has enhanced their impact on applied
statistics. I will give a brief personal history of
randomness and review the practical use of computer
simulation in the biological and clinical science research
and the kinds of problems it can help solve. The simple
structure of simulation programs in the SAS language will be
discussed and some examples given.
Talk Slides
|
|
|
 |
September 25, 2007 |
|
| |
"Computer Simulation: A Practical Tool in Health
Research--Part I"
by Paul N. Corey,
University of Toronto
Simulation has a long and proud history in biostatistics.
The introduction of digital computers to generate pseudo
random numbers has enhanced their impact on applied
statistics. I will give a brief personal history of
randomness and review the practical use of computer
simulation in the biological and clinical science research
and the kinds of problems it can help solve. The simple
structure of simulation programs in the SAS language will be
discussed and some examples given.
Talk Slides
|
|
| |
|
|
|
|