| |
Abstracts &
Slides
|
 |
April 21, 2009 (HSB 790) |
|
|
|
"Estimation in Generalized Linear Models with Measurement
Error" by Taraneh Abarin, University of Manitoba
Generalized linear models (GLM) are widely used in
biostatistics, epidemiology, and many other areas. However,
the real data analyzes using GLM often involve covariates
that are not observed directly or are measured with error.
In such cases, statistical estimation and inference become
very challenging. Most of the proposed approaches rely on
the normality assumption for the unobserved covariates and
measurement error. A computational difficulty with the
likelihood approach is that the likelihood function involves
multiple integrals which do not admit closed forms in
general. First on this talk, we are going to be familiar
with some basic concepts of measurement error area. Then, we
consider the generalized linear models which allow very
general heteroskedastic regression errors. In particular, we
study the Second-Order Least Squared estimation combined
with instrumental variables. This approach does not require
the parametric assumptions for the distributions of the
unobserved covariates and of the measurement error, which
are difficult to check in practice. We will close the talk,
by a review on some computational issues.
Talk Slides
I,
II,
III
|
|
|
 |
April 7, 2009 |
|
|
|
"Power of Logistic Regression with Measurement Error in
Predictor Variable and Varying Number of Observations per
Subject" by
Olga Melnichouk
and Salomon
Minkin, Ontario Cancer Institute
We present a case study where the objective was to assess
the power of a proposed study of serum biomarkers and risk
of breast cancer. In the study of serum biomarkers, we
intended to examine in a nested case-control study the
association of breast-cancer risk with long-term exposure to
serum levels of sex-hormone binding globulin (SHBG), total
and free serum estradiol and testosterone, androstyenedione,
C-peptide, lipids and lipoprotein, and adiponectin. The
biomarkers of interest are subject to varying degree of
measurement error comprised of temporary within-subject
variation as well as variation in the lab measurements. We
proposed to measure these biomarkers in multiple blood
samples per subject and use their average as a surrogate of
a true long-term exposure. The nested case-control study was
carried out within the cohort of the Canadian Diet and
Breast Cancer Prevention Study, a randomized trial of
intervention with a low-fat high-carbohydrate diet. Blood
samples were collected annually in years preceding the
diagnosis of breast cancer for the case subjects and
throughout the study for the control subjects, and the
number of repeated blood samples per subject varied from
subject to subject. It is known that in the linear and
logistic regression analysis, measurement error in a
continuous exposure variable leads to attenuated estimate of
the regression coefficient and power loss. To correct for
measurement error when subjects’ averages are based on
varying number of replicates we apply the methodology
proposed by Armstrong et al. (1989) and Kim and
Zeleniuch-Jacquotte (1997). Our technique for determining
power relies on simulation that incorporates the specific
characteristics of the proposed study. We demonstrate how
much power is gained as additional blood samples per subject
are measured. We then show how the proposed techniques can
be adapted to other situations.
Talk Slides
Recommended Readings
I, II
|
|
|
 |
March 31, 2009 |
|
|
|
"An Introduction to Binary Recursive Partitioning:
Regression Trees" by
Mohamed Abdolell, Dalhousie
University
This presentation will outline the background of the CART
algorithm and the discuss some of the extensions from the
original algorithm to the context of survival, poisson, and
MVN repeated measures outcomes.
The development of
the binary recursive partitioning algorithm will be traced
from the scenario of a single split to that of recursive
splitting that generates full tree models. Extensive use of
real data sets will illustrate the application of the
algorithm.
The longRPart R library (available on
CRAN, 2009) will be introduced for the analysis of MVN
repeated measures outcomes; longRPart is an adaptation of
the rpart R library by Atkinson & Therneau. To illustrate
the use of longRPart, a data set containing the outcome of
PBK phoneme scores repeatedly taken on prelingually deaf
children will be used to identify children who would most
benefit from cochlear implantation. Confidence intervals and
p-values associated with the first split in the tree (Abdolell
et al, 2002) are implemented in the library.
Some of
the weaknesses traditionally associated with CART models can
be addressed through the application of ensemble methods.
And so we will very briefly touch on the topic of Random
Forests (Breiman 2001) as an extension of ensemble methods
to the context of CART algorithms.
Talk Slides
Recommended Reading
|
|
|
 |
March 24, 2009 |
|
|
|
"Leprosy as a Genetic Model for Susceptibility to Common
Infectious Diseases" by
Alexandre Alcaïs, Institut
National de la Santé et de la Recherche Médicale U.550,
Paris, France
Leprosy is a human infectious disease that can be
effectively treated with long-term administration of
multi-drug therapy. In 2006, over 250,000 new cases were
reported to the World Health Organization. In the nineteenth
century, disagreement among leprologists regarding the
hereditary or infectious nature of leprosy was resolved with
the identification of the etiological agent,
Mycobacterium leprae. However, epidemiological studies
maintain the importance of host genetics in leprosy
susceptibility. A model-free genome-wide linkage scan in
multi-case families from Vietnam led to the positional
cloning of global genetic risk factors in the
PARK2/PACRG and LTA genes. The process of
identifying the susceptibility variants provided invaluable
insight into the replication of genetic effects,
particularly the importance of considering
population-specific linkage-disequilibrium structure. As
such, these studies serve to improve our understanding of
leprosy pathogenesis by implicating novel biological
pathways while simultaneously providing a genetic model for
common infectious diseases.
Talk Slides
Recommended Readings
I,
II,
III,
IV
|
|
|
 |
March 17, 2009 |
|
|
|
"Automated Time-trending Tool for Improved Quality Assurance
Reliability" by
May Tang,
May Consulting Services Inc.
Managing quality assurance process data has been an issue in
meeting current higher regulatory standards. There is a need
to develop software tools to automate assessment of time
trending. We created a computer program trending interface
between data entry and data analysis software, such as
Microsoft Office application - Excel, SAS/JMP and SAS 9. The
software can be used to automate routine quality assessment
of data by laboratory technicians. The approach simplifies
current needs that require review of analyses by more
technical staff or even by statisticians.
Rationale: Improved quality assurance of
biopharmaceutical processing, using routine staff to assess
trending data quality. Session Objective: Introduce
a new way to automate Quality Assurance processes, improving
compliance monitoring and trending. Audience Take Home
Benefits: Reduce resources used to monitor processing
trends and improve quality assurance and achieve greater
operational efficiency/effectiveness. Also, participants
will be able to see the benefits of an automated
user-friendly computer interface for data trending.
Talk Slides
|
|
|
 |
March 10, 2009 |
|
|
|
"Estimating Unbiased Dose-Response Curves from Repeated
Measures in the Presence of Confounding" by
Erica E.M.
Moodie,
McGill University
In a longitudinal study of dose-response, confounding due to
an observational study design or non-compliance in a
randomized trial compromises the estimation of the true
effect of treatment. Standard regression methods cannot
remove the bias introduced by patient-selection of dose,
however when follow-up times are irregular, marginal
structural models may not provide a practical solution.
Using an approach based on the Generalized Propensity Score
(GPS), it is possible to construct a balancing score that
provides an unbiased estimation procedure for the true
direct effect of dose. We apply the GPS methodology using on
a novel formulation of the treatment density to the a cohort
of HIV and HIV-HCV infected individuals to examine the
dose-response relationship between the use of
anti-retroviral therapy and liver function.
Talk Slides
Recommended Readings
|
|
|
 |
March 3, 2009 |
|
|
|
"Assessing Reliability or Agreement for Continuous
Measurements" by
Qilong Yi, Canadian Blood Services
Reliability or agreement assessment is an important step in
various applications. There exists a large volume of
literature on its use and on underlying methodology; in 2008
alone, more than 1600 such articles can be found on PubMed.
Many statistical methods (indexes) have been proposed to
measure reliability, such as the correlation coefficient,
intraclass correlation coefficient, limits of agreement,
repeatability coefficients, and equivalent test. This
presentation will review these methods, discuss their
advantages and limitations, and illustrate their use by
analyzing an example dataset.
Talk Slides
Recommended Readings
I,
II
|
|
|
 |
February 24, 2009 |
|
|
|
"Opportunities and Challenges with Observational HIV
Database" by
Janet Raboud, Mount Sinai Hospital
Recent initiatives have established both provincial and
national cohort studies of HIV positive individuals. The
Ontario HIV Cohort Study is a longitudinal cohort study of
HIV positive individuals attending ten primary and tertiary
care centers in Ontario. Individuals complete either a short
(15 min) or long (90 min) questionnaire annually, which is
linked to clinical data extracted from medical charts. CANOC
is a collaboration of cohort studies in Ontario, Quebec and
BC, with current enrollment of approximately 5000
individuals. This seminar shall focus on the unintended
statistical consequences of administrative decisions, non
random refusal rates, differing clinic visit patterns, and
non random missing data. Issues such as informative interval
censoring, recurrent events and imputation of missing data
shall be discussed.
Talk Slides
|
|
|
 |
February 10, 2009 |
|
|
|
"Toward the Conceptualization and Measurement of the Spaces
of Daily Life (with A Primer on GIS)"
by
Ron Buliung,
University of Toronto
For several decades social scientists (from geographers to
environmental psychologists) have wrestled with the
conceptualization and measurement of human spatial
experiences in urban environments. Many theoretical
constructs have been introduced; more notable examples
include: home range, activity space, and action space.
Interestingly, both the conceptual and computational aspects
of this work bear striking similarities to the efforts of
spatial ecologists who study the activities of non-human
animals. Much of the “human-work”, however, has been
motivated by the desire to advance thinking about how we
use, experience, and produce space. Recent efforts have also
focused on understanding the complex relationships that
exist between health and planning policy and outcomes, urban
form and design, and daily activities. The aim of this
seminar is to provide some introduction to my recent
thinking about the conceptualization and measurement of the
spaces of daily life. The conceptual part of the session
will place some emphasis on the theoretical treatment of
“space” within the geographical tradition. Attention will
then turn to the practical matter of constructing and
estimating (from space-time activity data) spatial metrics
to support policy-based research focused on questions
pertaining to urban form, design, and spatial behaviour.
Talk Slides
|
|
|
 |
February 3, 2009 |
|
|
|
"Modeling the Cumulative Effects of Time-varying Exposure,
Weighted by Recency, on the Hazard"
by
Marie-Pierre
Sylvestre, Samuel Lunenfeld Research Institute
Many epidemiological studies assess the effects of
time-dependent exposures, where both the exposure status and
its intensity vary over time. One example that abundantly
attracts public attention concerns pharmacoepidemiological
studies of the therapeutic or adverse effects of
medications. In many of these studies, prescribed dose and
duration of drug use vary both over time and across
subjects. The analysis of such studies poses the particular
challenge of modeling the association between complex
time-dependent drug exposure, especially given the
uncertainty about the etiological relevance of doses taken
in different time periods. In this talk, I will present a
flexible method for modeling cumulative effects of
time-varying exposures, weighted by recency, represented by
time-dependent covariates in the Cox's proportional hazards
model. The function that assigns weights to doses taken in
the past is estimated from the data using cubic regression
splines. I will discuss the results of simulations conducted
to investigate the properties of the proposed approach. I
will also present the results obtained from using the method
to study the association between exposure tobenzodiazepines
and fall-related injuries in the elderly.
Talk Slides
|
|
|
 |
January 27, 2009 |
|
|
|
"Small
Area Estimation for Spatially Aggregated Data with
Time-varying Boundaries"
by Steve Fan,
University of Toronto
Disease incidence data are often aggregated spatially with
only the postal or census region of the cases being made
available. The regions may vary in size, larger in rural
areas and smaller in urban centres. While estimating the
risk of rare diseases often requires data collected over a
long period of time, the boundaries of these regions may not
be the same throughout the entire collection period. A
conventional cross sectional analysis modelling risk at the
region level and incorporating dependence between
neighbouring regions is not possible when the boundaries are
changing.
We extend the ems algorithm proposed by Fan
and Stafford (2008) in the application of recurrent interval
censored data to analyze the map data using the longitudinal
census and estimate a spatially smooth risk surface. A
simulation study is conducted to assess its performance. Not
only does this proposed method achieve smaller mean
integrated squared error than the competing nonparametric
mle estimator, but it also reduces the amount of
computational time significantly. In the special case in
which region the boundaries remains unchanged during the
collection period, our proposed risk estimator reduces to
the kernel risk estimator proposed by Brillinger (1991).
We then apply the proposed method to study the spatial
risk that attributes to newly diagnosed lupus cases in
Greater Toronto Area while using a generalized linear model
to accommodate potential age-gender group and time effects.
This analysis allows us to identify the regional with the
highest risk while adjusting regional population
compositions and time trend.
Talk Slides
|
|
|
 |
January 20, 2009 |
|
|
|
"Introduction
to Structural Equation Models"
by Jerry
Brunner,
University of Toronto
The classical structural equation models extend linear
regression to allow for latent as well as observed
variables, and also to allow a dependent variable in one
equation to be to be an independent variable in another.
Special cases include univariate and multivariate regression
with and without measurement error, path analysis,
exploratory and confirmatory factor analysis, and certain
forms of longitudinal data analysis.
The first half
of the talk will cover notation and basic principles, with
special attention to model identification. The second half
will illustrate model fitting and hypothesis testing using
SAS proc calis.
Talk Slides,
Programs
|
|
|
 |
January 13, 2009 |
|
|
|
"Likelihood-free Inference for Infectious Disease Models"
by Robert Deardon,
University of Guelph
Likelihood-based inference for epidemic models has always
proved difficult, mainly due to difficulties associated with
calculating or evaluating the likelihood, particularly in
large scale models. Unobserved or partially observed data
often further complicates this process. Here we investigate
the performance of Markov chain Monte Carlo routines based on approximate likelihoods
generated from model simulations as a means of estimating
parameters in epidemic models for both complete and
incomplete data. We illustrate our techniques using examples
such as (i) common cold data from Tristan da Cunha; and (ii) data from an outbreak of Ebola
Haemorrhagic Fever in the Democratic Republic of Congo.
Talk Slides
|
|
|
|
|
|
|
|
Fall 2008 Seminars (HSB 106) |
|
|
|
|
|
|
 |
December 9,
2008 |
|
|
|
"Statistician's Role in Industry & Sanofi Pasteur:
Canada's Vaccine Company" by Aleksandra Kolenc-Saban, Sanofi
Pasteur
This talk will focus on what a statisticians needs to know
(in addition to statistics) to be successful in the
industry, and also what a statistician does (roles and
responsibilities) in the vaccine industry. A brief
introduction and overview of Sanofi Pasteur, Canada's
Vaccine Company, will also be given.
Talk Slides
|
|
|
 |
December 2,
2008 |
|
|
|
"Syndromic
Surveillance: Real Time Applications of Spatial Statistics”
by James G. Heller,
James G. Heller Consulting Inc.
The seminar will consist of a brief introduction to the
topic of surveillance in public health and certain
statistical issues in large scale screening and surveillance
that depend on the chosen analytical framework. These
include issues in: temporal disease surveillance, spatio-temporal
disease mapping and monitoring; and syndromic surveillance.
The data requirements for syndromic surveillance are touched
upon and illustrated with examples (ESSENCE, the US DOD
health surveillance system; and 2 retrospective Canadian
examples of early warning surveillance). The statistical
surveillance problem is specified and illustrated with a
commonly used method, the CUSUM method.
Talk Slides
|
|
|
 |
November 25,
2008 |
|
|
|
"Competing
Causes of Death from a Randomized Trial of Extended Adjuvant
Endocrine Therapy for Breast Cancer: NCIC CTG MA.17”
by Judy-Anne
Chapman, NCIC and Queen’s Universit
Background:
Older women with early-stage breast cancer experience higher
rates of non-breast cancer-related death. We examined
factors associated with cause-specific death in a large
cohort of breast cancer patients treated with extended
adjuvant endocrine therapy.
Methods: In
the MA.17 trial conducted by the National Cancer Institute
of Canada Clinical Trials Group, 5170 breast cancer patients
(median age = 62 years; range = 32-94 years) who were
disease-free after approximately 5 years of adjuvant
tamoxifen treatment were randomly assigned to treatment with
letrozole (2583 women) or placebo (2587 women). The median
follow-up was 3.9 years (range = 0-7.0 years). We
investigated the association of 11 baseline factors on the
competing risks of death from breast cancer, other
malignancies, and other causes. All statistical tests were
two-sided likelihood ratio criterion tests.
Results: During
follow-up, 256 deaths were reported (102 from breast cancer,
50 from other malignancies, 100 from other causes, and four
from an unknown cause). Non-breast cancer deaths accounted
for 60% of the 252 known deaths (72% for those >=70 years
and 48% for those <70 years). Two baseline factors were
differentially associated with type of death: cardiovascular
disease was associated with a statistically significant
increased risk of death from other causes (P = .002) and
osteoporosis was associated with a statistically significant
increased risk of death from other malignancies (P = .05).
An increased risk of breast cancer-specific death was
associated with lymph node involvement (P<.001). Increased
risk of death from all three causes was associated with
older age (P<.001).
Conclusions:
Non-breast cancer-related deaths were more common than
breast cancer-related deaths in this cohort of 5-year cancer
survivors, especially among older women.
Co-authors:
Meng D(1), Shepherd
L(1), Parulekar W(1), Ingle JN(2), Muss HB(3), Palmer M(1),
Yu C(1), Goss PE(4) (1) NCIC Clinical Trials Group,
Kingston, Canada; (2) Mayo Clinic, Rochester, USA;
(3) University of Vermont,
Burlington, USA;
(4) Harvard
University,
Boston, USA
Talk Slides
Recommended Readings
I,
II
|
|
|
 |
November 18,
2008 |
|
|
|
"Combining
Machine Learning and Bayesian Approaches for Inference in
Statistical Genomics"
by Rafal Kustra,
University of Toronto
The ongoing technological revolution in microbiology
resulted in a deluge of complex, large, and multi-modal
experimental data available for analysis. In this talk I
will present our method for estimating very large number
(millions) of p-values for a set of parallel hypothesis
tests from an Single Nucleotide Polymorphism (SNP)
microarray experiment in cancer genomics. Currently SNP
arrays can provide measurements to obtain up to a million
genotypes (letters of DNA code at location where
population-wide variations occur), which can then be used to
investigate association between genetic variations and
cancer risk. Since cancer is a genetically complex disease,
of interest are also potential cancer associations of
genetic interactions among two of more pieces of DNA. The
number of such potential interaction is of course enormous,
and even after all sensible filtering and test reduction
methods have been applied, one can easily face millions of
hypothesis tests and resulting p-values. Many times
asymptotic inference is not applicable and non-parametric
inference is applied which presents a challenge in simply
computing p-values. Our method relies on a trivial
observation that accurate estimates of p-values which are
large (hence uninteresting) is not needed. In this talk I
will show how combining a machine learning predictive model
(Random Forest, although many other choices are possible) to
obtain initial predictions of p-values, together with a
Bayesian scheme to selectively updated "interesting"
p-values with permutations, can result in enormous reduction
of computational time but with minimal loss of accuracy. I
will demonstrate our method by applying it to a data from
phase-one of a large case-control project in Colorectal
Cancer genomics (ARCTIC). In addition to presenting our
method and the results, I will also introduce the SNP data,
the problem of association testing, and time permitting, I
will discuss approaches to the multiple testing problem
which can also benefit from our method.
This is join
research with Duncan Murdoch, Xiaofei Shi, Celia Greenwood,
and Jagadish Rangrej
Talk Slides
Recommended Readings
|
|
|
 |
November 11,
2008 |
|
|
|
"Three-Level Model for the Analysis of the Visual
Analog Scale for Negative Mood Intensity in Patients with
Borderline Personality Disorder and Recurrent Suicidal
Behavior: An Application of Experience Sampling Methodology"
by Rosane
Nisenbaum,
St Michael's Hospital
Borderline Personality Disorder is a chronic severe
psychiatric illness characterized by impulsivity, mood
swings, instability of interpersonal relationships and
recurrent suicidal behavior. As with many other illnesses
defined by complex syndromes, patients are quite
heterogeneous, especially with respect to within- and
between-patient variability in mood swinging states. In the
current study, we used an Experience Sampling Method design
to capture both types of variability. 82 participants were
given electronic organizers that were programmed to beep at
6 randomized times, every day, for 21 days. At each time,
subjects were asked to complete a Visual Analogue Scale and
rate the intensity of 26 mood states. In this presentation,
we will illustrate how three-level models can be applied to
estimate the variability within patients (Level 1), between
days (Level 2), and between patients (Level 3), and to
determine predictors of change in negative mood intensity
ratings. We will discuss our model-building strategy and
will show how graphs can help in interpreting final results.
Talk Slides
Recommended Readings
I,
II
|
|
|
 |
November 4,
2008 (Rescheduled to January 13, 2009) |
|
|
|
"Likelihood-free Inference for Infectious Disease Models"
by Robert Deardon,
University of Guelph
Likelihood-based inference for epidemic models has always
proved difficult, mainly due to difficulties associated with
calculating or evaluating the likelihood, particularly in
large scale models. Unobserved or partially observed data
often further complicates this process. Here we investigate
the performance of Markov chain Monte Carlo routines based on approximate likelihoods
generated from model simulations as a means of estimating
parameters in epidemic models for both complete and
incomplete data. We illustrate our techniques using examples
such as (i) common cold data from Tristan da Cunha; and (ii) data from an outbreak of Ebola
Haemorrhagic Fever in the Democratic Republic of Congo.
Talk Slides
Recommended Readings
|
|
|
 |
October 28,
2008 |
|
|
|
"A
Multilevel Analysis of Gender, Neighborhood Material
Deprivation, and Number of Weekly Drinks Among Canadian
Adults Aged 31 to 74"
by Rahim
Moineddin,
University of Toronto & ICES
In this study we used multi-level Zero-inflated Poisson
Regression to assess the impact of neighborhood material
deprivation on the drinking behavior of adult men and women
aged 31 to 74 living in 25 of the largest urban areas in
Canada. Using the Canadian
Community Health Survey (CCHS) in conjunction with
tract-level data from the 2001 census we showed a positive
association between drinking behavior and neighborhood
material deprivation differs by gender such that men living
in neighborhoods with higher material deprivation have
heavier drinking patterns relative to women and to men
living in neighborhoods with lower material deprivation
after adjusting for individual-level socio-demographic
status and lifestyle characteristics. Findings showed that
drinking patterns varied significantly across neighborhoods.
The relationship between neighborhood material deprivation
and drinking is highly pronounced among men even after
adjusting for age, marital and educational status, visible
minority, smoking, stress, and community sense of belonging.
The findings suggest that men living in high poverty
neighborhoods in urban Canada were at greater risk of
drinking than men in more affluent neighborhoods.
Talk Slides
|
|
|
 |
October 21,
2008 |
|
|
|
"SSC
Case Studies: What, Why, and How "
by Alison Gibbs,
University of Toronto
For many years, case studies in data analysis have been an
important part of the Statistical Society of Canada (SSC)
annual meeting. The
case studies are a valuable way for graduate students in
statistics to get experience applying statistical
methodology to a real problem and to participate
meaningfully in the conference. We will talk about what the
case studies are, why you may want to participate in them,
what is expected of participants, and how the case studies
can also be a useful resource for teachers of statistics.
Talk Slides
Recommended Readings
|
|
|
 |
October 14,
2008 |
|
|
|
"Epidemiology
101"
by
Dionne Gesink
Law,
University of Toronto
Epidemiologists and Biostatisticians work together closely
to solve public health problems to ensure the health and
wellness of populations. Therefore, being familiar with the
language, concepts, and tools of each other’s field is vital
for successful collaborations. The purpose of this
presentation is to provide a brief overview of the field of
epidemiology including public health success stories,
terminology, and the continuum of epidemiologic studies from
developing study questions, through study design, data
analysis, and interpretation and communication of study
results. If time permits, we will review an outbreak
investigation.
Talk Slides
|
|
|
 |
October 7,
2008 |
|
|
|
"Compound
Poisson Approximation of Palindrome Length Score in
Herpesviruses"
by Ming-Ying Leung,
University of Texas at El Paso
Empirical
studies have shown that around the replication origins of
many herpesviruses, there are unusual clusters of
palindromes in their genome nucleotide sequences. Chew et al
(2005) introduce the palindrome length scheme (PLS) to
quantify the spatial abundance of palindromes in a
nucleotide sequence. Under an i.i.d. random sequence model,
the PLS score distribution is well approximated by a
compound Poisson distribution. Simulation results further
demonstrate that the approximation remains good for
nucleotide sequences generated by a Markov chain. This
provides us with the statistical criteria to predict
specific regions of the herpesvirus genomes as likely
locations of replication origins.
Talk Slides
Recommended Readings,
I,
II
|
|
|
 |
September
30,
2008 |
|
| |
"Statistics
and Stories: Insight, not Numbers"
This talk is a videotape replay of the presentation given by
Neil Sheldon (UK) on June 24, 2008, for the
Royal
Statistical Society Guy Lecture at the University of
Toronto, an event co-sponsored by the
Department of
Statistics at UofT, the
Statistical
Society of Canada (SSC), and the
Southern-Ontario Regional Association (SORA) of SSC.
(Provided by Neil Sheldon)
To students – and perhaps to some teachers – statistics
seems to be mainly about collecting data, drawing diagrams
and calculating numerical summaries. Actually, statistics is
really about understanding the world around us.
In this talk, Neil will show, through a series of
illustrations, how data can be properly understood only if
we know the stories behind the statistics. Who collected the
figures and why? What assumptions were made and are they
valid? What do we know already that can help us to check the
data for plausibility? What conclusions can we draw and how
sure can we be?
Understanding the world around us is vital to playing a full
part in society - and so is understanding statistics.
Talk Slides
|
|
|
 |
September
23,
2008 |
|
| |
"Genome-Wide
Association of ~2.5M Common Alleles with Long-Term
Complication of Type 1 Diabetes and Related Traits:
Statistical Issues"
by Andrew
Paterson,
The Hospital for Sick Children
Following from the sequencing of the human genome, the
discovery of millions of common DNA sequence variations,
coupled with technologies for measuring millions of them at
relatively low cost has resulted in an avalanche of studies
identifying common genetic variants associated with many
different common complex diseases using case-control
designs. We have used similar approaches in subjects from a
clinical trial of long-term complications of type 1
diabetes, focusing on the analysis of multiple time-to-event
outcomes. I will discuss general design considerations,
approaches to analysis, and preliminary results. In
addition, we have performed genetic analysis of numerous
quantitative risk factors (biomarkers) in the same subjects.
Many outstanding statistical and logistic issues have arisen
and some will be illustrated
Talk Slides
Recommended Reading
|
|
| |
|
|
|
|