Abstracts & Slides
 

April 1 and 8, 2008 [3-5pm] (HS 108)

 
 


"Data Integration Methods in the Life Sciences"
by Joseph Beyene and Jemila Hamid,
The Hospital for Sick Children

The importance of data integration has been widely recognized in the health sciences as a critical component to evidence-based and well-informed decisions in health-care delivery. Scientists need to be able to access, analyze and interpret a wide range of information in order to answer important clinical questions, understand biological systems, elucidate the impact of drug interventions on diseases etc., and this requires data to be integrated. In recent years, there has been an exponential growth in the amount of life science data generated by high throughput experiments (e.g., microarray gene expression data, Mass spectroscopy protein data, sequence variations etc.), each data type with its own level of complexity and varying quality. One of the research focuses of our group is the development and application of methods for ‘data integration’.  

There are two sessions for our presentation.

Part I (April 1, Joseph Beyene):
In this first part, we will provide a conceptual framework of key data integration tasks and methods and will focus on meta-analytic approaches that are widely used for integrating similar data types, primarily in clinical medicine. We will describe effect measures typically used with different outcome variables, discuss about fixed versus random effect models and modeling assumptions. Contentious issues such as heterogeneity and publication bias will be discussed briefly. 

Part II (April 8, Jemila Hamid):
In the second part of the talk, we will talk about kernel based methods for integrating heterogeneous data. We will briefly discuss kernel matrices and their application in cluster analysis (unsupervised learning) and discriminant analysis (supervised learning).  We will focus on Fisher Discriminant Analysis (FDA) followed by its non linear extension, Kernel Fisher Discriminant Analysis (KFDA).

We will then present our on going work for combining heterogeneous data using weighted KFDA. We will illustrate our results using the well known Fisher’s iris data. We will also show some preliminary results using breast cancer microarray and clinical data.

Talk Slide Part I, Part II

 

 

March 25, 2008 [4-5pm] (HS 108)

 
 


"Introduction to Disease Mapping Using Bayesian Models"
by
Virgilio Gόmez-Rubio, Imperial College London, UK

Bayesian models have been used very successfully in recent years to study the risk of mortality in many different contexts. In my talk I will introduce different methods of estimating the mortality risk. Starting with the Standardised Morality Ratio, I will discuss a series of Empirical Bayes (EB) estimators that have been proposed in recent years. EB estimators are based on Bayesian hierarchical models where the hiperparameters are estimated from the data and the posterior distribution of the parameters of interest derived from there.

Full Bayesian approaches of some of these models will be discussed as well. In addition, the model by Besag, York and Mollié will be fully described. This model can account for a spatial structure in the data as well as relevant covariates.

All these methods will be illustrated using real data on lung cancer males mortality in Toronto at the tract level.

Talk Slides, R-Code/Files, Assignment

 

 

March 18, 2008 [3-5pm] (HS 108)

 
 


"Spatial Point Processes"
by Patrick Brown, Cancer Care Ontario

Locations of disease incidence can be thought of as random points in the plane. One important question is whether these points are located independently of each other, or if they tend to cluster together. As cases are more likely in areas of high population, measuring clustering needs to take population density into account. The Inhomogeneous K-function is a tool for assessing clustering in spatial point processes, and this lecture will explain how the K-function is related to mathematical properties of the underlying spatial point process.

Talk Slides

 

 

March 4 and 11, 2008 [3-5pm] (HS 108)

 
 


"Time Dependent Covariates in Parametric Survival Models"
by Sandra Gardner, Sunnybrook Health Sciences Centre

The semi-parametric Cox regression model for survival data can incorporate time varying covariates.  There are some examples in the literature where time varying covariates are incorporated into parametric survival models (for example, Petersen T. Fitting Parametric Survival Models with Time-Dependent Covariates. Applied Statistics-Journal of the Royal Statistical Society Series C, 35 (3): 281-288 1986). 

In Part I, we will compare the Cox model to the parametric survival model when we have a proportional hazards model or a non proportional hazards model. Examples using sample or simulated data will be presented along with the corresponding SAS code.

In Part II, we will compare and contrast the Poisson regression model which can also incorporate time varying covariates.  Other methods for analyzing survival data with time-dependent covariates found in the literature will be discussed.

Talk Slides Part I, Part II

 

 

February 26, 2008 [3-5pm] (HS 108)

 
 


"All Is Not Well in the House of Statistics: A Competing Approach to the Analysis of Genetic Association"
by Lisa Strug, Hospital for Sick Children

The "multiple testing problem" currently bedevils genetic association studies, especially for genome wide studies where often >500,000 Single Nucleotide Polymorphism tests are conducted across the genome.   Briefly stated, this problem arises when we perform more than one statistical test, which leads to increased probabilities of committing at least one type I error.  The conventional solution to this problem relies on the classical Neyman-Pearson statistical paradigm, since that is the paradigm used to analyze the data for association, and involves adjusting one's error probabilities.  This adjustment is, however, problematic because in the process of doing that, one is also adjusting one's measure of evidence.  Investigators have actually become wary of looking at their data, for fear of having to adjust the strength of the evidence they observed at a given locus on the genome every time they conduct an additional test.

The evidential paradigm uses the likelihood ratio (as opposed to a p-value) as the measure of evidence for association, and provides new, alternatively defined error probabilities (analogous to Type I and Type II error rates), i.e., probabilities of being misled.  We have shown how this paradigm separates or decouples the two concepts of error probabilities and strength of the evidence.  Here we apply the evidential paradigm to genetic association studies and the associated multiple testing problem.  We advocate using the likelihood ratio as the sole measure of the strength of evidence; we then derive the corresponding probabilities of being misled by the data under different multiple-testing scenarios.

We distinguish two situations:  performing multiple tests of a single hypothesis, vs. performing a single test of multiple hypotheses.  For the first situation the probability of being misled remains small regardless of the number of times one tests the single hypothesis, as we show.  For the second situation, we provide a rigorous argument outlining how replication samples themselves (analyzed in conjunction with the original sample) provide appropriate adjustments for testing multiple hypotheses on a data set.

Talk Slides
Recommended Readings I, II, III


 

 

February 12, 2008 [3-5pm] (HS 108)

 
 


"Competing Risks Analysis"
by Melania Pintilie, Princess Margaret Hospital

In the time to event analysis there is the possibility to observe more than one type of event. A competing risks situation appears when the observation of the event of interest is hindered by the occurrence of another type of event. In the presence of competing risks the probability of the event of interest cannot be estimated using the usual product-limit (Kaplan-Meier) method. Kalbfleisch and Prentice introduced a non-parametric method to estimate the probability of the event of interest, referred as the cumulative incidence function. To facilitate the understanding of these two methods the estimates using the cumulative incidence function will be compared with the estimates obtained from Kaplan-Meier method in theoretical framework as well as through examples. There are two types of hazard that can be modeled, each with its own interpretation. Cox proportional hazards model can be applied for one of the hazards while the second type is modeled using a partial likelihood introduced by Fine and Grey. Although some theoretical details will be given, this talk will focus on applied issues. Examples will be shown, mostly drawn from cancer research. The methodology can easily be extended to other areas where competing risks are present. In the second part, the use of the specific R-package for competing risk will be illustrated.

Talk Slides Part I, Part II

 

 

January 29 and February 5, 2008 [3-5pm] (HSB 790)

 
 


"Longitudinal Data Analyses of Cohorts Created Through Record Linkage to Canadian Mortality and Cancer Databases"
by
Paul Villeneuve, Health Canada

This presentation will provide an overview of two recent studies that have made use of Statistics Canada’s capabilities to link administrative data to national mortality, and cancer incidence data. A description of the methods used Statistics Canada to conduct record linkage will be given. Thereafter, I will describe methods of longitudinal data analyses that were applied to evaluate the relationship between long-term exposure to radon and lung cancer in a cohort of Newfoundland fluorspar miners.  These methods include the estimation of person-years of follow-up, internal and external cohort analyses, and the evaluation of modifiers of radon related lung cancer risk, including cigarette smoking. The second study to be discussed is a cohort study of transplant patients identified from CIHI's Canadian Organ Replacement Registry database.  This study population will be described, and findings from preliminary analyses of this cohort will be presented. These analyses have examined the risk of developing cancer among patients who received kidney transplants, with consideration of dialysis as a time-dependent risk factor.  This presentation will be followed-up with an in-class computer lab session on February 5. In the computer lab session on February 5, a more thorough review of the SAS programs that were used to perform analyses of the cohorts will be provided, and students will be asked to perform similar analyses on provided practice data sets.

Talk Slides Part I, Part II
Programs and Data, Exercise

 

 

January 22, 2008  (HSB 108)

 
 


"Effects of Unemployment on Health"
by
 Hideki Ariizumi, Wilfred Laurier University

I investigate the effects of unemployment on health status. Due to the fact that unemployment and health are simultaneously determined, a single-equation regression method may not be appropriate. To address this issue, a two-equation model is specified and jointly estimated. The error terms are decomposed into two parts, one with time-invariant and the other with time-variant component. Both error components are allowed to be correlated between the two equations. For the time-invariant error component, I use the nonparametric random effects model. For the time-variant component, I use the bivariate probit approach. Furthermore, to help the identification of the causal effect of unemployment on health, I use the instrumental variable approach. The main finding is that, for prime-age male labor market participants, unemployment has a negative and large impact on self-reported health status, while it has no effect on the objective health measure.

Talk Slides

 

 

January 8, 2008 [3-5pm] (HSB 108)

 
 


"Swimming Without a Lifeguard: An Introduction to Analyzing Complex Survey Data"
by
John Amrhein, SAS Canada

Most analytical tools, including most SAS/STAT procedures, assume that your data consist of independent observations of a simple random sample from an infinite population. Inferential statistical methods employed by these tools allow you to make valid inferences about the population from which that sample was drawn. However, in many surveys, data does not represent a simple random sample of independent, identically distributed observations selected from an infinite population. Complex designs, from stratified to multi-phase cluster designs, generate sampled observations that are not independent, are not identically distributed, and are not selected from an infinite population. To make correct inferences, you must account for the complex design by using the appropriate estimators for attributes and their variances. It is also beneficial to account for the finite nature of the population.

This lecture will introduce inferential methods that account for common survey designs. One or two examples will be shown using SAS/STAT procedures.

Talk Slides
SAS codes, Data

 

 
     
  Fall 2007 Seminars (HSB 100)  
     

December 4, 2007

 
 


"Introduction to Receiver Operating Characteristic (ROC) Analysis in Medical Research--Part II"
by
Gina Lockwood
, Princess Margaret Hospital

ROC methodology, derived from statistical decision theory, dates back to the early 1950s when it was developed to summarize data from signal detection experiments. It is used in medical applications to assess the performance of diagnostic (or prognostic) tests which must choose which of two conditions, unknown at the moment of decision, exists (or will exist).  These lectures will introduce the basic concepts involved in evaluating the statistical properties of a test, including sensitivity, specificity and the ROC curve. Several methods for fitting, summarizing and comparing ROC curves will be examined. Examples of dichotomous, ordinal and continuous tests taken from oncology studies will be presented.

Talk Slides

 

 

November 27, 2007

 
 


"Introduction to Receiver Operating Characteristic (ROC) Analysis in Medical Research--Part I"
by
Gina Lockwood
, Princess Margaret Hospital

ROC methodology, derived from statistical decision theory, dates back to the early 1950s when it was developed to summarize data from signal detection experiments. It is used in medical applications to assess the performance of diagnostic (or prognostic) tests which must choose which of two conditions, unknown at the moment of decision, exists (or will exist).  These lectures will introduce the basic concepts involved in evaluating the statistical properties of a test, including sensitivity, specificity and the ROC curve. Several methods for fitting, summarizing and comparing ROC curves will be examined. Examples of dichotomous, ordinal and continuous tests taken from oncology studies will be presented.

Talk Slides

 

 

November 20, 2007 [3-5pm]

 
 


"Sequential Methods with Applications to Genetic Studies"
by
Laurent Briollais
, Samuel Lunenfeld Research Institute, Mt Sinai Hospital

The modern theory of sequential analysis stems from the work of A. Wald in the U.S. and G. Barnard in Great Britain, who participated in industrial advisory groups for war production in the mid 1940s. Since then, sequential approaches have been a natural way to proceed in many experimentations, especially in the design of clinical trials where interim analyses and the regulations of their report have been formally described by the FDA (1988). In the first part of this lecture, we will introduce some general concepts about sequential methods. The second part will describe more specifically some applications to genetic studies, an emerging and promising field of application for the this approach.

Talk Slides

 

 

November 13, 2007

 
 


"BUGS Research Day"
by
The Biostatistics Union of Gradate Students

Steve Fan:  Are Variable Section Methods Based on Akaike Information Criterion Better?

Gerald Lebovic:  Modeling Data with Ordinal Outcomes

Ahmed Hossain:  Nonparametric and Parametric Estimation of Area under Receiver Operating Characteristic curves (AUC) from continuously-distributed Data and comparing two nonparametric AUCs


Talk Slides: Steve, Gerald, Ahmed

 

 

November 6, 2007

 
 


"Generalized Linear Mixed Models for Categorical Responses--Part II: Poisson Regression"
by
Rahim Moineddin, University of Toronto & ICES

These two brief lectures will be introductory. The extension of the generalized linear models to the class of generalized linear mixed models to include random effects will be discussed. Modeling binary and count data for studies with hierarchal structure or  repeated measurements will be covered.  Real data sets will be used for illustration. Questions of interest include testing for significance of covariates, interpretation of parameter estimates, and details of SAS procedures NLMIXED and GLIMMIX.

Talk Slides
Assignment & Dataset

Suggested Reading 1, Reading 2

 

 

October 30, 2007

 
 


"Generalized Linear Mixed Models for Categorical Responses--Part I: Generalized Linear Mixed Models"
by
Rahim Moineddin, University of Toronto & ICES

These two brief lectures will be introductory. The extension of the generalized linear models to the class of generalized linear mixed models to include random effects will be discussed. Modeling binary and count data for studies with hierarchal structure or  repeated measurements will be covered.  Real data sets will be used for illustration. Questions of interest include testing for significance of covariates, interpretation of parameter estimates, and details of SAS procedures NLMIXED and GLIMMIX.

Talk Slides

 

 

October 23, 2007

 
 


"Spatial Statistics for Environmental Epidemiology"
by Patrick Brown, Cancer Care Ontario


This lecture will be a brief introduction to some problems in environmental Epidemiology, related to modelling counts of diseases in different regions such as census tracts or municipalities.  Modelling data of this sort should allow for incidence numbers to be affected by the population's age and sex structure, measured covariates such as social deprivation, and spatially varying random components.  Questions of interest include testing for significance of covariates and detecting regions with abnormally high risk.

Talk Slides
Assignment

 

 

October 16, 2007

 
 


"Moving beyond the disease atlas model for public health surveillance: The Nova Scotia Breast Screening Program"

by Mohamed Abdolell, Dalhousie University

The primary features of public health surveillance systems will be reviewed and how such a system is being established within the context of the Nova Scotia Breast Screening Program (NSBSP).  The uniqueness of the NSBSP database is that it captures the entire patient trajectory through both the screening and diagnostic systems from the time a woman participates in the NSBSP for her first mammogram and is now on the cusp of doing so for all mammographically-screened women in the province.  The database is used for centralized booking of women of both screening and diagnostic mammograms and is maintained in real-time. The main objective of the surveillance system was to implement an automated reporting system that enables the generation of a fully formatted NSBSP Annual Report, including various program-specific as well as nationally-based performance indicators, in both print and web formats in a matter of 1 hour, rather than the current 9-12 months that is required to generate the report manually. Consequently the report can be generated in real-time and provides the basis for an on-demand surveillance system.   The system is implemented exclusively using General Public License software including R, LaTeX, Sweave, Perl, and several other helper scripting languages on an open source Linux distribution (Ubuntu 7.04).  The advantage of such an implementation is that it can be finely tuned to exact specifications of the NSBSP and can be easily modified to accommodate the emerging surveillance needs of the NSBSP with minimal added cost.  A key aspect of developing this system is to explore the feasibility of integrating statistical process control and other statistical methods into a fully automated surveillance system.  Such an open source solution is particularly valuable in resource-poor jurisdictions enabling surveillance at a reasonable cost.

Talk Slides


 

 

October 9, 2007

 
 


"How to Predict the Final Outcome of a Clinical Trial"

by K.K. Gordon Lan, Univ of MDNJ and J&J, New Jersey

In the 1960s and 1970s, almost all clinical trials were designed as fixed. That is, efficacy of a treatment would be determined by the final data analysis.  Despite the fixed design,  many NIH-sponsored clinical trials were periodically reviewed by Policy Advisory Boards (they are called Data Monitoring Committees nowadays). During interim analyses, clinicians on the Board often asked the question: If the current trend continues, what is the chance that we will have a positive study?. We will discuss how to put this question into a statistical framework and provide a simple answer. The chance is called conditional power (CP) or predictive power (PP). We will discuss the use of CP, along with group sequential methods, for early termination of a clinical trial. The concept of CP and PP can also be applied to sample size estimation for a new study.

Talk Slides


 

 

October 2, 2007

 
 


"Computer Simulation: A Practical Tool in Health Research--Part II"
by Paul N. Corey, University of Toronto

Simulation has a long and proud history in biostatistics. The introduction of digital computers to generate pseudo random numbers has enhanced their impact on applied statistics. I will give a brief personal history of randomness and review the practical use of computer simulation in the biological and clinical science research and the kinds of problems it can help solve. The simple structure of simulation programs in the SAS language will be discussed and some examples given.

Talk Slides

 

 

September 25, 2007

 
 


"Computer Simulation: A Practical Tool in Health Research--Part I"
by Paul N. Corey, University of Toronto

Simulation has a long and proud history in biostatistics. The introduction of digital computers to generate pseudo random numbers has enhanced their impact on applied statistics. I will give a brief personal history of randomness and review the practical use of computer simulation in the biological and clinical science research and the kinds of problems it can help solve. The simple structure of simulation programs in the SAS language will be discussed and some examples given.

Talk Slides
 

 
 

 

 

 

 
     
   
 


Last updated September 23, 2008
All contents copyright © 2005, Department of Public Health Sciences, University of Toronto.