Abstracts & Slides
 

April 21, 2009 (HSB 790)

 
  "Estimation in Generalized Linear Models with Measurement Error"
by Taraneh Abarin, University of Manitoba

Generalized linear models (GLM) are widely used in biostatistics, epidemiology, and many other areas. However, the real data analyzes using GLM often involve covariates that are not observed directly or are measured with error. In such cases, statistical estimation and inference become very challenging. Most of the proposed approaches rely on the normality assumption for the unobserved covariates and measurement error. A computational difficulty with the likelihood approach is that the likelihood function involves multiple integrals which do not admit closed forms in general. First on this talk, we are going to be familiar with some basic concepts of measurement error area. Then, we consider the generalized linear models which allow very general heteroskedastic regression errors. In particular, we study the Second-Order Least Squared estimation combined with instrumental variables. This approach does not require the parametric assumptions for the distributions of the unobserved covariates and of the measurement error, which are difficult to check in practice. We will close the talk, by a review on some computational issues.

Talk Slides I, II, III


 

April 7, 2009

 
 
"Power of Logistic Regression with Measurement Error in Predictor Variable and Varying Number of Observations per Subject"
by
Olga Melnichouk and Salomon Minkin, Ontario Cancer Institute

We present a case study where the objective was to assess the power of a proposed study of serum biomarkers and risk of breast cancer. In the study of serum biomarkers, we intended to examine in a nested case-control study the association of breast-cancer risk with long-term exposure to serum levels of sex-hormone binding globulin (SHBG), total and free serum estradiol and testosterone, androstyenedione, C-peptide, lipids and lipoprotein, and adiponectin. The biomarkers of interest are subject to varying degree of measurement error comprised of temporary within-subject variation as well as variation in the lab measurements. We proposed to measure these biomarkers in multiple blood samples per subject and use their average as a surrogate of a true long-term exposure. The nested case-control study was carried out within the cohort of the Canadian Diet and Breast Cancer Prevention Study, a randomized trial of intervention with a low-fat high-carbohydrate diet. Blood samples were collected annually in years preceding the diagnosis of breast cancer for the case subjects and throughout the study for the control subjects, and the number of repeated blood samples per subject varied from subject to subject. It is known that in the linear and logistic regression analysis, measurement error in a continuous exposure variable leads to attenuated estimate of the regression coefficient and power loss. To correct for measurement error when subjects’ averages are based on varying number of replicates we apply the methodology proposed by Armstrong et al. (1989) and Kim and Zeleniuch-Jacquotte (1997). Our technique for determining power relies on simulation that incorporates the specific characteristics of the proposed study. We demonstrate how much power is gained as additional blood samples per subject are measured. We then show how the proposed techniques can be adapted to other situations.

Talk Slides
Recommended Readings I, II

 

March 31, 2009

 
 
"An Introduction to Binary Recursive Partitioning: Regression Trees"
by
Mohamed Abdolell, Dalhousie University

This presentation will outline the background of the CART algorithm and the discuss some of the extensions from the original algorithm to the context of survival, poisson, and MVN repeated measures outcomes.

The development of the binary recursive partitioning algorithm will be traced from the scenario of a single split to that of recursive splitting that generates full tree models. Extensive use of real data sets will illustrate the application of the algorithm.

The longRPart R library (available on CRAN, 2009) will be introduced for the analysis of MVN repeated measures outcomes; longRPart is an adaptation of the rpart R library by Atkinson & Therneau. To illustrate the use of longRPart, a data set containing the outcome of PBK phoneme scores repeatedly taken on prelingually deaf children will be used to identify children who would most benefit from cochlear implantation. Confidence intervals and p-values associated with the first split in the tree (Abdolell et al, 2002) are implemented in the library.

Some of the weaknesses traditionally associated with CART models can be addressed through the application of ensemble methods. And so we will very briefly touch on the topic of Random Forests (Breiman 2001) as an extension of ensemble methods to the context of CART algorithms.


Talk Slides
Recommended Reading

 

March 24, 2009

 
 
"Leprosy as a Genetic Model for Susceptibility to Common Infectious Diseases"
by
Alexandre Alcaïs, Institut National de la Santé et de la Recherche Médicale U.550, Paris, France

Leprosy is a human infectious disease that can be effectively treated with long-term administration of multi-drug therapy. In 2006, over 250,000 new cases were reported to the World Health Organization. In the nineteenth century, disagreement among leprologists regarding the hereditary or infectious nature of leprosy was resolved with the identification of the etiological agent, Mycobacterium leprae. However, epidemiological studies maintain the importance of host genetics in leprosy susceptibility. A model-free genome-wide linkage scan in multi-case families from Vietnam led to the positional cloning of global genetic risk factors in the PARK2/PACRG and LTA genes. The process of identifying the susceptibility variants provided invaluable insight into the replication of genetic effects, particularly the importance of considering population-specific linkage-disequilibrium structure. As such, these studies serve to improve our understanding of leprosy pathogenesis by implicating novel biological pathways while simultaneously providing a genetic model for common infectious diseases.

Talk Slides
Recommended Readings I, II, III, IV

 

March 17, 2009

 

"Automated Time-trending Tool for Improved Quality Assurance Reliability"
by May Tang,
  May Consulting Services Inc.

Managing quality assurance process data has been an issue in meeting current higher regulatory standards. There is a need to develop software tools to automate assessment of time trending. We created a computer program trending interface between data entry and data analysis software, such as Microsoft Office application - Excel, SAS/JMP and SAS 9. The software can be used to automate routine quality assessment of data by laboratory technicians. The approach simplifies current needs that require review of analyses by more technical staff or even by statisticians.

Rationale: Improved quality assurance of biopharmaceutical processing, using routine staff to assess trending data quality. Session Objective: Introduce a new way to automate Quality Assurance processes, improving compliance monitoring and trending. Audience Take Home Benefits: Reduce resources used to monitor processing trends and improve quality assurance and achieve greater operational efficiency/effectiveness. Also, participants will be able to see the benefits of an automated user-friendly computer interface for data trending.


Talk Slides

March 10, 2009

 
 
"Estimating Unbiased Dose-Response Curves from Repeated Measures in the Presence of Confounding"
by
Erica E.M. Moodie, McGill University

In a longitudinal study of dose-response, confounding due to an observational study design or non-compliance in a randomized trial compromises the estimation of the true effect of treatment. Standard regression methods cannot remove the bias introduced by patient-selection of dose, however when follow-up times are irregular, marginal structural models may not provide a practical solution. Using an approach based on the Generalized Propensity Score (GPS), it is possible to construct a balancing score that provides an unbiased estimation procedure for the true direct effect of dose. We apply the GPS methodology using on a novel formulation of the treatment density to the a cohort of HIV and HIV-HCV infected individuals to examine the dose-response relationship between the use of anti-retroviral therapy and liver function.

Talk Slides
Recommended Readings

 

March 3, 2009

 
 
"Assessing Reliability or Agreement for Continuous Measurements"
by
Qilong Yi, Canadian Blood Services

Reliability or agreement assessment is an important step in various applications. There exists a large volume of literature on its use and on underlying methodology; in 2008 alone, more than 1600 such articles can be found on PubMed. Many statistical methods (indexes) have been proposed to measure reliability, such as the correlation coefficient, intraclass correlation coefficient, limits of agreement, repeatability coefficients, and equivalent test. This presentation will review these methods, discuss their advantages and limitations, and illustrate their use by analyzing an example dataset.

Talk Slides
Recommended Readings I, II

 

February 24, 2009

 
"Opportunities and Challenges with Observational HIV Database"
by
Janet Raboud, Mount Sinai Hospital

Recent initiatives have established both provincial and national cohort studies of HIV positive individuals. The Ontario HIV Cohort Study is a longitudinal cohort study of HIV positive individuals attending ten primary and tertiary care centers in Ontario. Individuals complete either a short (15 min) or long (90 min) questionnaire annually, which is linked to clinical data extracted from medical charts. CANOC is a collaboration of cohort studies in Ontario, Quebec and BC, with current enrollment of approximately 5000 individuals. This seminar shall focus on the unintended statistical consequences of administrative decisions, non random refusal rates, differing clinic visit patterns, and non random missing data. Issues such as informative interval censoring, recurrent events and imputation of missing data shall be discussed.

Talk Slides

 

February 10, 2009

 
 
"Toward the Conceptualization and Measurement of the Spaces of Daily Life  (with A Primer on GIS)"

by
Ron Buliung, University of Toronto

For several decades social scientists (from geographers to environmental psychologists) have wrestled with the conceptualization and measurement of human spatial experiences in urban environments. Many theoretical constructs have been introduced; more notable examples include: home range, activity space, and action space. Interestingly, both the conceptual and computational aspects of this work bear striking similarities to the efforts of spatial ecologists who study the activities of non-human animals. Much of the “human-work”, however, has been motivated by the desire to advance thinking about how we use, experience, and produce space. Recent efforts have also focused on understanding the complex relationships that exist between health and planning policy and outcomes, urban form and design, and daily activities. The aim of this seminar is to provide some introduction to my recent thinking about the conceptualization and measurement of the spaces of daily life. The conceptual part of the session will place some emphasis on the theoretical treatment of “space” within the geographical tradition. Attention will then turn to the practical matter of constructing and estimating (from space-time activity data) spatial metrics to support policy-based research focused on questions pertaining to urban form, design, and spatial behaviour.

Talk Slides

 

February 3, 2009

 

"Modeling the Cumulative Effects of Time-varying Exposure, Weighted by Recency, on the Hazard"

by
Marie-Pierre Sylvestre, Samuel Lunenfeld Research Institute

Many epidemiological studies assess the effects of time-dependent exposures, where both the exposure status and its intensity vary over time. One example that abundantly attracts public attention concerns pharmacoepidemiological studies of the therapeutic or adverse effects of medications. In many of these studies, prescribed dose and duration of drug use vary both over time and across subjects. The analysis of such studies poses the particular challenge of modeling the association between complex time-dependent drug exposure, especially given the uncertainty about the etiological relevance of doses taken in different time periods. In this talk, I will present a flexible method for modeling cumulative effects of time-varying exposures, weighted by recency, represented by time-dependent covariates in the Cox's proportional hazards model. The function that assigns weights to doses taken in the past is estimated from the data using cubic regression splines. I will discuss the results of simulations conducted to investigate the properties of the proposed approach. I will also present the results obtained from using the method to study the association between exposure tobenzodiazepines and fall-related injuries in the elderly.

Talk Slides

January 27, 2009

 
 
"Small Area Estimation for Spatially Aggregated Data with Time-varying Boundaries"
by Steve Fan,
University of Toronto


Disease incidence data are often aggregated spatially with only the postal or census region of the cases being made available. The regions may vary in size, larger in rural areas and smaller in urban centres. While estimating the risk of rare diseases often requires data collected over a long period of time, the boundaries of these regions may not be the same throughout the entire collection period. A conventional cross sectional analysis modelling risk at the region level and incorporating dependence between neighbouring regions is not possible when the boundaries are changing.

We extend the ems algorithm proposed by Fan and Stafford (2008) in the application of recurrent interval censored data to analyze the map data using the longitudinal census and estimate a spatially smooth risk surface. A simulation study is conducted to assess its performance. Not only does this proposed method achieve smaller mean integrated squared error than the competing nonparametric mle estimator, but it also reduces the amount of computational time significantly. In the special case in which region the boundaries remains unchanged during the collection period, our proposed risk estimator reduces to the kernel risk estimator proposed by Brillinger (1991).

We then apply the proposed method to study the spatial risk that attributes to newly diagnosed lupus cases in Greater Toronto Area while using a generalized linear model to accommodate potential age-gender group and time effects. This analysis allows us to identify the regional with the highest risk while adjusting regional population compositions and time trend.


Talk Slides

 

January 20, 2009

 
"Introduction to Structural Equation Models"
by Jerry Brunner,
University of Toronto


The classical structural equation models extend linear regression to allow for latent as well as observed variables, and also to allow a dependent variable in one equation to be to be an independent variable in another. Special cases include univariate and multivariate regression with and without measurement error, path analysis, exploratory and confirmatory factor analysis, and certain forms of longitudinal data analysis.

The first half of the talk will cover notation and basic principles, with special attention to model identification. The second half will illustrate model fitting and hypothesis testing using SAS proc calis.


Talk Slides, Programs

 

January 13, 2009

 
 
"Likelihood-free Inference for Infectious Disease Models"
by Robert Deardon,
University of Guelph


Likelihood-based inference for epidemic models has always proved difficult, mainly due to difficulties associated with calculating or evaluating the likelihood, particularly in large scale models. Unobserved or partially observed data often further complicates this process. Here we investigate the performance of Markov chain Monte Carlo routines based on approximate likelihoods generated from model simulations as a means of estimating parameters in epidemic models for both complete and incomplete data. We illustrate our techniques using examples such as (i) common cold data from Tristan da Cunha; and (ii) data from an outbreak of Ebola Haemorrhagic Fever in the Democratic Republic of Congo.

Talk Slides

 
     
  Fall 2008 Seminars (HSB 106)  
     

December 9, 2008

 
 
"Statistician's Role in Industry & Sanofi Pasteur: Canada's Vaccine Company"
by Aleksandra Kolenc-Saban, Sanofi Pasteur


This talk will focus on what a statisticians needs to know (in addition to statistics) to be successful in the industry, and also what a statistician does (roles and responsibilities) in the vaccine industry. A brief introduction and overview of Sanofi Pasteur, Canada's Vaccine Company, will also be given.

Talk Slides

 

December 2, 2008

 
 
"Syndromic Surveillance: Real Time Applications of Spatial Statistics”
by James G. Heller, James G. Heller Consulting Inc.

The seminar will consist of a brief introduction to the topic of surveillance in public health and certain statistical issues in large scale screening and surveillance that depend on the chosen analytical framework. These include issues in: temporal disease surveillance, spatio-temporal disease mapping and monitoring; and syndromic surveillance. The data requirements for syndromic surveillance are touched upon and illustrated with examples (ESSENCE, the US DOD health surveillance system; and 2 retrospective Canadian examples of early warning surveillance). The statistical surveillance problem is specified and illustrated with a commonly used method, the CUSUM method.

Talk Slides


 

November 25, 2008

 
"Competing Causes of Death from a Randomized Trial of Extended Adjuvant Endocrine Therapy for Breast Cancer: NCIC CTG MA.17”
by Judy-Anne Chapman, NCIC and Queen’s Universit

Background:  Older women with early-stage breast cancer experience higher rates of non-breast cancer-related death. We examined factors associated with cause-specific death in a large cohort of breast cancer patients treated with extended adjuvant endocrine therapy.  Methods:  In the MA.17 trial conducted by the National Cancer Institute of Canada Clinical Trials Group, 5170 breast cancer patients (median age = 62 years; range = 32-94 years)  who were disease-free after approximately 5 years of adjuvant tamoxifen treatment were randomly assigned to treatment with letrozole (2583 women) or placebo (2587 women). The median follow-up was 3.9 years (range = 0-7.0 years). We investigated the association of 11 baseline factors on the competing risks of death from breast cancer, other malignancies, and other causes. All statistical tests were two-sided likelihood ratio criterion tests.   Results: During follow-up, 256 deaths were reported (102 from breast cancer, 50 from other malignancies, 100 from other causes, and four from an unknown cause). Non-breast cancer deaths accounted for 60% of the 252 known deaths (72% for those >=70 years and 48% for those <70 years). Two baseline factors were differentially associated with type of death: cardiovascular disease was associated with a statistically significant increased risk of death from other causes (P = .002) and osteoporosis was associated with a statistically significant increased risk of death from other malignancies (P = .05). An increased risk of breast cancer-specific death was associated with lymph node involvement (P<.001). Increased risk of death from all three causes was associated with older age (P<.001).  Conclusions:  Non-breast cancer-related deaths were more common than breast cancer-related deaths in this cohort of 5-year cancer survivors, especially among older women.

Co-authors:

Meng D(1), Shepherd L(1), Parulekar W(1), Ingle JN(2), Muss HB(3), Palmer M(1), Yu C(1), Goss PE(4)
(1) NCIC Clinical Trials Group, Kingston, Canada; (2) Mayo Clinic, Rochester, USA; (3) University of Vermont, Burlington, USA; (4) Harvard University, Boston, USA

Talk Slides
Recommended Readings I, II


 

November 18, 2008

 
 
"Combining Machine Learning and Bayesian Approaches for Inference in Statistical Genomics
"
by Rafal Kustra,
University of Toronto

The ongoing technological revolution in microbiology resulted in a deluge of complex, large, and multi-modal experimental data available for analysis.  In this talk I will present our method for estimating very large number (millions) of p-values for a set of parallel hypothesis tests from an Single Nucleotide Polymorphism (SNP) microarray experiment in cancer genomics. Currently SNP arrays can provide measurements to obtain up to a million genotypes (letters of DNA code at location where population-wide variations occur), which can then be used to investigate association between genetic variations and cancer risk. Since cancer is a genetically complex disease, of interest are also potential cancer associations of genetic interactions among two of more pieces of DNA. The number of such potential interaction is of course enormous, and even after all sensible filtering and test reduction methods have been applied, one can easily face millions of hypothesis tests and resulting p-values. Many times asymptotic inference is not applicable and non-parametric inference is applied which presents a challenge in simply computing p-values. Our method relies on a trivial observation that accurate estimates of p-values which are large (hence uninteresting) is not needed. In this talk I will show how combining a machine learning predictive model (Random Forest, although many other choices are possible) to obtain initial predictions of p-values, together with a Bayesian scheme to selectively updated "interesting" p-values with permutations, can result in enormous reduction of computational time but with minimal loss of accuracy. I will demonstrate our method by applying it to a data from phase-one of a large case-control project in Colorectal Cancer genomics (ARCTIC). In addition to presenting our method and the results, I will also introduce the SNP data, the problem of association testing, and time permitting, I will discuss approaches to the multiple testing problem which can also benefit from our method.

This is join research with Duncan Murdoch, Xiaofei Shi, Celia Greenwood, and Jagadish Rangrej


Talk Slides
Recommended Readings


 

November 11, 2008

 

"Three-Level Model for the Analysis of the Visual Analog Scale for Negative Mood Intensity in Patients with Borderline Personality Disorder and Recurrent Suicidal Behavior: An Application of Experience Sampling Methodology"
by Rosane Nisenbaum,
St Michael's Hospital


Borderline Personality Disorder is a chronic severe psychiatric illness characterized by impulsivity, mood swings, instability of interpersonal relationships and recurrent suicidal behavior. As with many other illnesses defined by complex syndromes, patients are quite heterogeneous, especially with respect to within- and between-patient variability in mood swinging states.  In the current study, we used an Experience Sampling Method design to capture both types of variability. 82 participants were given electronic organizers that were programmed to beep at 6 randomized times, every day, for 21 days. At each time, subjects were asked to complete a Visual Analogue Scale and rate the intensity of 26 mood states.  In this presentation, we will illustrate how three-level models can be applied to estimate the variability within patients (Level 1), between days (Level 2), and between patients (Level 3), and to determine predictors of change in negative mood intensity ratings. We will discuss our model-building strategy and will show how graphs can help in interpreting final results.

Talk Slides
Recommended Readings I, II


November 4, 2008 (Rescheduled to January 13, 2009)

 
 
"Likelihood-free Inference for Infectious Disease Models"
by Robert Deardon,
University of Guelph


Likelihood-based inference for epidemic models has always proved difficult, mainly due to difficulties associated with calculating or evaluating the likelihood, particularly in large scale models. Unobserved or partially observed data often further complicates this process. Here we investigate the performance of Markov chain Monte Carlo routines based on approximate likelihoods generated from model simulations as a means of estimating parameters in epidemic models for both complete and incomplete data. We illustrate our techniques using examples such as (i) common cold data from Tristan da Cunha; and (ii) data from an outbreak of Ebola Haemorrhagic Fever in the Democratic Republic of Congo.

Talk Slides
Recommended Readings


 

October 28, 2008

 
 
"
A Multilevel Analysis of Gender, Neighborhood Material Deprivation, and Number of Weekly Drinks Among Canadian Adults Aged 31 to 74"
by Rahim Moineddin,
University of Toronto & ICES

In this study we used multi-level Zero-inflated Poisson Regression to assess the impact of neighborhood material deprivation on the drinking behavior of adult men and women aged 31 to 74 living in 25 of the largest urban areas in Canada. Using the Canadian Community Health Survey (CCHS) in conjunction with tract-level data from the 2001 census we showed a positive association between drinking behavior and neighborhood material deprivation differs by gender such that men living in neighborhoods with higher material deprivation have heavier drinking patterns relative to women and to men living in neighborhoods with lower material deprivation after adjusting for individual-level socio-demographic status and lifestyle characteristics. Findings showed that drinking patterns varied significantly across neighborhoods. The relationship between neighborhood material deprivation and drinking is highly pronounced among men even after adjusting for age, marital and educational status, visible minority, smoking, stress, and community sense of belonging. The findings suggest that men living in high poverty neighborhoods in urban Canada were at greater risk of drinking than men in more affluent neighborhoods.

Talk Slides

 
 

October 21, 2008


"
SSC Case Studies: What, Why, and How "
by Alison Gibbs,
University of Toronto

For many years, case studies in data analysis have been an important part of the Statistical Society of Canada (SSC) annual meeting. The case studies are a valuable way for graduate students in statistics to get experience applying statistical methodology to a real problem and to participate meaningfully in the conference.  We will talk about what the case studies are, why you may want to participate in them, what is expected of participants, and how the case studies can also be a useful resource for teachers of statistics.

Talk Slides
Recommended Readings


 

October 14, 2008

 
 
"
Epidemiology 101"
by Dionne Gesink Law,
University of Toronto

Epidemiologists and Biostatisticians work together closely to solve public health problems to ensure the health and wellness of populations. Therefore, being familiar with the language, concepts, and tools of each other’s field is vital for successful collaborations.  The purpose of this presentation is to provide a brief overview of the field of epidemiology including public health success stories, terminology, and the continuum of epidemiologic studies from developing study questions, through study design, data analysis, and interpretation and communication of study results.  If time permits, we will review an outbreak investigation.

Talk Slides

 
 

October 7, 2008

 
 
"
Compound Poisson Approximation of Palindrome Length Score in Herpesviruses"
by Ming-Ying Leung,
University of Texas at El Paso

Empirical studies have shown that around the replication origins of many herpesviruses, there are unusual clusters of palindromes in their genome nucleotide sequences. Chew et al (2005) introduce the palindrome length scheme (PLS) to quantify the spatial abundance of palindromes in a nucleotide sequence. Under an i.i.d. random sequence model, the PLS score distribution is well approximated by a compound Poisson distribution. Simulation results further demonstrate that the approximation remains good for nucleotide sequences generated by a Markov chain. This provides us with the statistical criteria to predict specific regions of the herpesvirus genomes as likely locations of replication origins.

Talk Slides
Recommended Readings, I, II

 
 

September 30, 2008

 
 


"Statistics and Stories: Insight, not Numbers"
This talk is a videotape replay of the presentation given by Neil Sheldon (UK) on June 24, 2008, for the Royal Statistical Society Guy Lecture at the University of Toronto, an event co-sponsored by the Department of Statistics at UofT, the Statistical Society of Canada (SSC), and the Southern-Ontario Regional Association (SORA) of SSC.

(Provided by Neil Sheldon) To students – and perhaps to some teachers – statistics seems to be mainly about collecting data, drawing diagrams and calculating numerical summaries. Actually, statistics is really about understanding the world around us.

In this talk, Neil will show, through a series of illustrations, how data can be properly understood only if we know the stories behind the statistics. Who collected the figures and why? What assumptions were made and are they valid? What do we know already that can help us to check the data for plausibility? What conclusions can we draw and how sure can we be?

Understanding the world around us is vital to playing a full part in society - and so is understanding statistics.

Talk Slides

 

 

September 23, 2008

 
 


"Genome-Wide Association of ~2.5M Common Alleles with Long-Term Complication of Type 1 Diabetes and Related Traits: Statistical Issues"
by Andrew Paterson,
The Hospital for Sick Children

Following from the sequencing of the human genome, the discovery of millions of common DNA sequence variations, coupled with technologies for measuring millions of them at relatively low cost has resulted in an avalanche of studies identifying common genetic variants associated with many different common complex diseases using case-control designs. We have used similar approaches in subjects from a clinical trial of long-term complications of type 1 diabetes, focusing on the analysis of multiple time-to-event outcomes. I will discuss general design considerations, approaches to analysis, and preliminary results. In addition, we have performed genetic analysis of numerous quantitative risk factors (biomarkers) in the same subjects. Many outstanding statistical and logistic issues have arisen and some will be illustrated

Talk Slides
Recommended Reading

 

 
     

 

 
     
   
 


Last updated November 01, 2009
All contents copyright © 2005, Department of Public Health Sciences, University of Toronto.