Abstracts & Slides
 

February 7, 2012

 
 
"Generalized Procrustes Analysis with an Application"

by Siddik Keskan, Yuzuncu Yil University

In this talk, we present a dimension reduction method called Generalized Procrustes Analysis (GPA), which is often used in sensory analysis and shape analysis. When, for example, similar food products are evaluated by multiple experts/judges according to a number of attributes for quality or consumer preferences, this method can be applied to examine the relationships among products, attributes, and experts. GPA is an exploratory technique that enables one to represent graphically the results from multiple dimensions onto two-dimensional maps. The GPA algorithm involves three steps: translation, rotation/reflection, and isotropic scaling. We will describe these steps and their corresponding mathematical formulations, after a brief general introduction to the method. An application to agriculture research will be discussed to illustrate the practical use of GPA.

Talk Slides

 
January 31, 2012  
 
"Heterogeneity Bias in Logistic and Cox Regression"

by Tim Ramsay, Ottawa Hospital Research Institute & University of Ottawa

It is a surprisingly little-known fact that omitting important predictors from both the logistic regression and the Cox proportional hazard models leads to biased effect estimates. Importantly, this is true when the omitted predictors are unassociated with the exposure of interest and it is therefore completely unrelated to confounding. While some statisticians argue that it is wrong to refer to this phenomenon as bias, it will be argued that for practical purposes that is the right way to think of it. This presentation will present this phenomenon and discuss its relevance to the analysis and interpretation of data from randomized clinical trials. Results from a systematic review of published RCTs will be presented to suggest that the implications may be important.

Talk Slides

 

January 24, 2012

 
 
"Systematic Random Sampling after Serpentine Ordering to Select Neighbourhoods in Toronto: the Neighbourhood Effects on Health and Well-Being (NEHW) Project"

by Rosane Nisenbaum, St. Michael’s Hospital


The Neighbourhood Effects on Health and Well-Being (NEHW) Project in Toronto was designed to identify neighbourhood, in addition to individual, factors associated with mental health outcomes. Data collection started in 2009 and ended in 2011, and data analysis is currently underway. A three-stage sampling design was considered: 50 of 140 neighbourhoods were selected at Stage 1, 2 census tracts within each neighbourhood were selected at Stage 2; and 30 households within each census tract were selected at Stage 3. In this talk, we focus on describing the selection of the neighbourhoods using systematic random sampling after serpentine ordering to provide implicit geographic stratification of neighbourhoods (Leslie Kish 1995, Survey Sampling; Jeff Geuder, US Department of Agriculture, Statistical Reporting Service, SF&SRB Report No79, February 1984 “Paper Stratification in SRS Area Sampling Frames”).

Talk Slides

Recommended Reading

 
     
  Fall 2011 Seminars  
     

December 6, 2011

 
"An Introduction to the Kalman Filter and its Applications"

by Ian Moore, Quatrametrics Inc.


In this lecture, we present a recursive solution to the discrete-data linear filtering problem, which was developed by Kalman in the early 1960s to address trajectory estimation problems that are commonly found in aerospace engineering. This recursive system, of linear algebraic equations, is popularly known as the Kalman filter, which is used to remove noise or error from repeated measurements taken over time. The Kalman filter is optimal, if we make the assumption that the error (the part being removed), within an arbitrary set of discrete input measurements, is Gaussian, when minimizing the mean-squared error between the raw measurements and the true underlying values. The Kalman filter is used in a wide range of applications in various fields of engineering. It is also used in economics, as it can be applied to the estimation of structural macroeconomic models. Using several examples, we demonstrate its usefulness to health economics and health policy research. 

Talk Slides

 

November 29, 2011

 
 
"Confidence Interval Estimation for Continuous Outcomes in Cluster Randomization Trials"

by Julia Taleban, Samuel Lunenfeld Research Institute


Cluster randomization trials are experiments where intact social units (e.g. hospitals, schools, communities, and families) are randomized to the arms of the trial rather than individuals. The popularity of this design among health researchers can be explained by many reasons, including reduced contamination of treatment effects and convenience. However, the advantages of cluster randomization trials do not come without a price. Due to the dependence of individuals within a cluster, cluster randomization trials suffer reduced statistical efficiency and often a complex analysis of study outcomes. The primary purpose of this work is to propose new confidence intervals for effect measures commonly of interest for continuous outcomes arising from cluster randomization trials. Specifically, we construct new confidence intervals for the difference between two normal means, the difference between two lognormal means, and the exceedance probability. The proposed confidence intervals, which use the method of variance estimates recovery (MOVER), do not make certain assumptions that existing procedures make on the data. For instance, symmetry is not forced when the sampling distribution of the parameter is skewed and the assumption of homoscedasticity is not made. Simulation studies are used to investigate the small sample properties of the MOVER as compared with existing procedures. Unbalanced cluster sizes are simulated, with an average range of 50 to 200 individuals per cluster and 6 to 24 clusters per arm. The effects of various degrees of dependence between individuals within the same cluster are also investigated. When comparing the empirical coverage, tail errors, and median widths of confidence interval procedures, the MOVER has coverage close to the nominal, relatively balanced tail errors, and narrow widths as compared to existing procedure for the majority of the parameter combinations investigated. Existing data from cluster randomization trials are then used to illustrate each of the methods.

Talk Slides

 

November 22, 2011

 
 
"The Cumulative Effects of Time-varying Covariates in Survival Analysis: Flexible Modeling and Applications"

by  Michal Abrahamowicz, McGill University


Many exposures and risk factors evaluated in prospective studies of survival (or, more generally, of time to a specific event) show important variation over time. We focus on 2 challenges that need to be addressed to ensure accurate testing and modeling of the potential cumulative effects of such time-varying exposures. (i) Firstly, while the etiological relevance of an exposure may vary considerably depending on how long ago it occurred, little is known about relative importance of exposures that occurred in different intervals in the past. (ii) Secondly, the form of the dose-response relationship, describing the impact of increasing exposure intensity at a fixed point in time on the hazard, is typically unknown. For example, (i) cardiovascular (CVD) risk factors, such as serum cholesterol or blood pressure, change considerably over the person’s lifetime, with different individual showing different patterns of change, and there is a continuing debate about the relative impact of their recent vs much earlier values on the current CVD risk. On the other hand, (ii) several CVD risk factors have non-linear relationships with the logit or log-hazard of CVD risk, and the form of the dose-response curve varies depending on the risk factor. Whereas there is vast statistical literature on flexible modeling of dose-response curves, the only published method for flexible modeling of the function assigning relative importance weights to past exposures is limited to logistic regression analyses of case-control studies. We recently proposed a flexible spline-based model to estimate the weighed cumulative exposure (WCE) effects in the framework of a generalized Cox’s model for censored survival data. The value of the WCE at time τ, is modelled as a function of the past exposure history, described by the time-dependent exposure intensity x(t), for 0<t< τ: WCE(τ|x(t), t< τ) = ∑w(τ –t)*[x(t)], where w(τ –t) is modeled using low-dimension cubic regression splines and assigns importance weights to past exposures as the function of the time elapsed since the exposure (τ –t). The estimated WCE is then included as a time-dependent covariate in the Cox’s proportional hazards model. To assess the accuracy of the proposed estimates we rely on simulations and use our versatile ‘permutational algorithm’ to generate event times conditional on time-varying WCE. No currently available method handles flexible modeling of both (i) weight function for cumulative effects modeling, and (ii) non-linear dose response curve. Thus, we have recently extended the above model to: WCE(τ|x(t), t< τ) = ∑w(τ –t)*s[x(t)] , where s[x(t)] represents a smooth dose-response curve describing the relationship between exposure intensity (dose) at a given point in time and the logarithm of the hazard. To illustrate a real-life application, we use a large long-term cohort study with repeated measures of blood pressure, to re-assess their effects on CVD mortality and morbidity.
 
Talk Slides

Recommended Readings I, II

 

November 15, 2011

 
 
"The Case-Crossover Study Design: An Approach for Evaluating Associations
Between Daily Air Pollution Concentrations and Hospitalization for Stroke"

by Paul J. Villeneuve, Health Canada


Over the past decade, ambient air pollution has emerged as a global risk factor for stroke. For the most part, associations between air pollution and stroke have been found in studies that have evaluated day-to-day changes in air pollution concentrations rather than with long term exposure. The case-crossover study design is a frequently used approach to characterize the risk of adverse health outcomes in relation to short-term changes in environmental exposures. In this seminar, an overview of the case-crossover design will be presented, and recent findings from ongoing research evaluating the associations between ambient air pollution and stroke in Edmonton will be discussed. Examples of SAS programming techniques used to derive the risk estimates will be reviewed, and presented in class. Sample data and programs will also be provided so participants can practice conducting case-crossover analysis.
 
Talk Slides

Source Codes SAS & R,  and  Data


 

November 8, 2011

 
 
"Disease Mapping with Log Gaussian Cox Processes"

by Lennon Li, University of Toronto


In disease mapping, when target diseases have low prevalences, the study usually covers a long time period to accumulate sufficient cases. However, during this period, numerous irregular changes in the census regions on which population is reported may occur, which complicates inferences. A new model was developed for the case when the exact location of the cases is available, consisting of a continuous random spatial surface and fixed effects for time and ages of individuals. The process is modeled on a fine grid, approximating the underlying continuous risk surface with Gaussian Markov Random Field and Bayesian inference is performed using integrated nested Laplace approximations. Further, when the exact location of the cases is not known, inference is complicated by the uncertainty of case locations due to data aggregation on census regions for confidentiality. Conventional modeling relies on the census boundaries that are unrelated to the biological process being modeled, and may result in stronger spatial dependence in less populated regions which dominate the map. A new model was developed consisting of a continuous random spatial surface with aggregated responses and fixed covariate effects on census region levels. The continuous spatial surface was approximated by Markov random field, greatly reduced the computational complexity. The process was again modeled on a lattice of fine grid cells and Bayesian inference was performed using Markov Chain Monte Carlo with data augmentation.

Simulations studies were carried out to assess performance of proposed model and to compare with the conventional Besag-York-Molliè model as well as model assuming exact locations are known. Receiver operating characteristic curves and Mean Integrated Squared Errors were used as measures of performance. The exact location model is applied to clinical data on the location of residence at the time of diagnosis of new Lupus cases in Toronto for the 40 years to 2007 and the aggregated model is applied to clinical data of syphilis cases in North Carolina, for the 9 years to 2007, with the aims of finding areas of abnormally high risk. Predicted risk and posterior exceedance probabilities maps are produced and results were interpreted.

Talk Slides

 

October November 1, 2011

 

"Construction of Bivariate Distributions via Principal Components"

by Amparo Casanova, University of Toronto


The diagonal expansion of a bivariate distribution (Lancaster, 1958) has been used as a tool to construct bivariate distributions; this method has been generalized using principal dimensions of random variables (Cuadras 2002). Sufficient and necessary conditions are given for uniform, exponential, logistic and Pareto marginals in the one and two-dimensional cases. The corresponding copulas are obtained.
 
Talk Slides

Recommended Readings I, II, III
Additional Readings IV, V


October 25, 2011

 
 
"Determining Optimal Sample Sizes for Multi-stage Randomized Clinical Trials from an Industry Perspective Using Value of Information Methods"

by Maggie Chen, University of Toronto


A model is proposed for the expected total profit that includes consideration of per-patient profit, disease incidence, time horizon, trial duration, market share, and the relationship between the trial results and probability of regulatory approval. The proposed VOI method includes multi-stage adaptive designs with a solution for two-stage design. With an example, it has demonstrated that significant increases in expected net gain can be achieved by using multi-stage design and a smaller expected total sample size and less cost will be required.
 
Talk Slides

 

October 18, 2011

 
 
"Sample Weights in Bayesian Regression: A Case Study"

by Tim Guimond, Anky Lai, Josh Murray, Kristen O'Brien & Case Studies Team, University of Toronto


This presentation will discuss the process of participating in the SSC case studies as well as the results from an analysis of the income gap between young Canadian men and women from 1996 to 2008. A Bayesian approach was used as a sophisticated attempt at dealing with sampling weights in the prediction of income. We examined an unweighted model, a continuously weighted model and a model subdividing sample weights into classes. The gender gap in income between young Canadians is expanding over time and, controlling for other covariates, men earn significantly more than women. We will discuss the ideal approach to completing an SSC case study while highlighting the dilemmas that are encountered along the way.
 
Talk Slides: Part I & Part II

 

October 11, 2011


"Efficient Analysis of Case-Control Studies with Sample Weights (with Application to U.S. Kidney Cancer Study)"
by Vicky Landsman, St. Michael's Hospital & NIH/NCI


I will talk about analysis of case-control studies from complex sampling designs. For these studies, the sample selection mechanism is usually informative, which requires incorporating sample weights in the analysis to obtain consistent estimates of the population parameters. The conventional weighted estimators, obtained by solving weighted estimating equations, are known to be inefficient when the weights are highly variable as is typical for case-control designs. I will present an alternative semi-parametric weighted estimator, obtained by solving weighted estimating equations using the model-adjusted rather than the original sample weights. The adjustment of the weights helps to reduce their variability and, as a result, improve the efficiency and reduce the bias of the estimators that use the adjusted weights. I will discuss benefits and limitations of the proposed estimator emphasizing efficiency and robustness. I will show some interesting results from the simulation study and the application of the methods to the U.S. Kidney Cancer Case-Control Study which motivated this work.
 
Talk Slides

Recommended Readings  I,  II


October 4, 2011

 

"Determinants of the Presence and Volume of Brown Fat in Humans"
by Joanne Quan, Lutong Zhou & Case Studies Team, University of Toronto


Brown fat is typically found in hibernating animals but recent technological advances have detected the presence of brown fat in humans as well. The goal of this project is to identify the factors determining the existence and the volume of brown fat in humans. The data contains 4842 observations from cancer patients, of which approximately 6% have brown fat. Generalized linear regression models and the Box-Cox transformation were used to build a 4-stage model to investigate the relationship between the covariates and presence and volume of brown fat. The results showed that the age, sex and external temperature were significant predictors; women have higher volumes of brown fat than men, and brown fat volume decreases with increasing age and lower external temperature.

Talk Slides

September 27, 2011

 
 
"Biomedical Applications of Runs and Patterns"
by Wendy Lou, University of Toronto


Sequences of categorical outcomes arise frequently in biomedical research, representing, for example, DNA segments or outcomes of healthcare evaluations. They are typically analyzed by defining on them problem-specific statistics involving runs and patterns.  The distributions of such statistics are often unknown except in the simplest cases, and the simplifying assumption of iid outcomes is in many cases not realistic.  The method of finite Markov chain imbedding is an effective tool for deriving and studying the distributions of complex statistics defined on categorical sequences.  In this talk, a review of the approach will be presented, and its usage will be illustrated via selected practical applications drawn from studies of HIV patient care and DNA tandem repeats, among others.

 

 
 
     
 


Last updated September 21, 2010
All contents copyright © 2005, Dalla Lana School of Public Health, University of Toronto.