Challenges in Modern Data Analysis (Amy Braverman, organizer)


Anthony Freeman (Jet Propulsion Laboratory)
Earth Science Data: A Look Ahead


Friday 8:30-9:00, Fountain I

Abstract:

Over the next 2 decades, Earth science measurements from space will continue to expand in scope, as more satellites are deployed by US and international space agencies. Up to the present date, much attention has been paid to analysis and understanding of data from one instrument at a time. This has been necessary because a deep understanding of instrument performance has been needed to first calibrate the data they produce, then understand and validate the relation to physical phenomena of the measurements.

In the future, information will be extracted from a combination of instruments from multiple platforms, insitu data and model outputs. This is beginning to happen in a few of the mature science areas, such as physical oceanography and atmospheric chemistry. The challenge is to do this efficiently, and in a manner that is repeatable across multiple investigations and disciplines.


Tom Torda (University of New South Wales)
Problems in Meta-Analyses: Studies Are Many But Cases Are Few


Friday 9:00-9:30, Fountain I

Abstract:

In these days of the tight health dollar, studies with sufficient power to detect, say a 5% difference in mortality or morbidity are getting rare. Some nationally funded studies achieve this but they are the exception, not the rule. Many studies, especially in the field of drug effects, are so small that their results are of quite doubtful value. One possible answer is meta-analyses (pooled analyses), but there are many obvious difficulties in performing such studies.

Access to data: Not all the data of published studies or those submitted to regulatory bodies is accessible and considerations other than those of science often govern access. Which studies should be included and which should not is a vexed question. Should it be restricted to double blind, controlled studies or should case-control studies also be used? The latter are often easier to perform and less expensive than the former and placebo controlled studies can lead to ethical problems. Should non-peer-reviewed studies be considered? These are often the basis of regulatory submissions. Publication bias towards positive results is a problem as is discovery of unpublished data. The quality of unpublished studies may well be inferior to those which have been published, but could they still merit inclusion? Each possible decision introduces some bias in the final study.

Do methods of the included studies have to be identical? This would further severely limit the number of available cases as in general, it would limit analyses to studies performed with the same protocol.

Differences in responses to drugs exist among ethnic groups and even within such groups. For example, the French or the Austrians are less sensitive to neuromuscular blocking drugs than Americans. Hong Kong Chinese are considerably more sensitive to opioids than Australians. Is it reasonable to ignore such differences? This runs the risk of masking answers which differ from group to group, where A may be the best for group 1 and B for group 2.

These are only some of the problems encountered when studies are combined to enlarge the data pool. Heterogeneity of the studies can also introduce statistical problems, such as Simpson's paradox.


William Szewczyk (National Security Agency)
What is a Datum?


Friday 9:30-10:00, Fountain I

Abstract:

During a briefing in the 80s the then Director of the Agency interrupted the speaker to ask the question, "What is a datum?" In this talk I will show that this question is not as flip as it first seems. To answer real questions data must be combined and analyzed from a variety of sources. I will illustrate this using the infamous Project X of the 1960s. One lesson to be learned is that timing is everything.