What can big data do for medicine? At the Meaningful Use of Complex Medical Data symposium, held 10-12 August in Los Angeles, California, researchers and doctors batted around several ideas. Presenters showed how data analysis helped them pick out patterns in flu infection, predict mortality in the intensive care unit (ICU), and recommend treatment plans for attention deficit/hyperactivity disorder. But the study of big medical data is in its infancy compared to the datasets crunched by the likes of Google and Amazon.com. For starters: where do programmers find the data, and how do they make sense of it?

Electronic medical records are the basic data that many researchers are interested in, but they are clunky to use in the eyes of data hounds. “Most of the very best electronic health information systems are still data repositories, which are like libraries made by throwing books into a dumpster,” commented presenter Warren Sandberg of Vanderbilt University in Nashville, Tennessee. This makes it very hard to retrieve meaningful information.

Another problem is that healthcare providers must choose one record system, from one vendor, and then stick with what that vendor has to offer. “Shouldn’t electronic health records look more like iPhones?” asked Kenneth Mandl, Children’s Hospital, Boston (Mandl and Kohane, 2009). Smart phones provide a basic architecture and anyone can write apps that access and work with them in different ways. Mandl is working to develop just such a health record system.

Even with a better record program in place, the nature of medical data makes it challenging to develop the algorithms and databases that MUCMD participants envision, said presenter Benjamin Marlin of the University of Massachusetts in Amherst. Numerical data, such as heart rate, accumulates over a long time course, often measured at irregular intervals. Sometimes pieces of data are missing, if a person did not undergo a particular test. These factors make it difficult to process data with standard statistics; Marlin suggested binning linear data streams into individual segments of equal time periods.

Data-Driven Decisions
Despite the ongoing challenges, MUCMD presenters reported several successes in gleaning meaningful information from medical databases. For example, Mandl used data from emergency room visits to model and predict how many people would come in with the flu—and his predictions were better than those from the Centers for Disease Control and Prevention, he said. The results indicated that three to four year olds are the first to fall ill with the virus (Brownstein et al., 2005) these data influenced the government to recommend all children between two and five receive flu vaccines.

Data can also predict outcomes on an individual basis, said Peter Szolovits of the Massachusetts Institute of Technology, who discussed the role of data in the intensive care unit (ICU). Simply knowing the likelihood a person will die could help doctors and administrators make important decisions, he said. For example, a hospital could plan for nurses to have more time to attend sicker people, and predict how many beds will be open. On the medical side, knowing a person’s risk of mortality could help physicians decide whether risky interventions are worthwhile.

Using data from 7,000 health records—including vital signs, lab reports, treatments and demographics--Szolovits built a model to help predict mortality in ICU patients. He boiled down all the bits of data that describe a person’s condition into one easily digested score for how a person is doing. The researchers then used another 3,000 records to test their algorithms. They found the model was most accurate on a person’s second and third days in the ICU; the outcomes for longer stays were harder to predict (Hug and Szolovits, 2009).

Not all medical decisions are life-and-death; doctors make many choices where hard data might be useful. But it takes a special kind of clinical trial to test decision strategies. Susan Murphy of the University of Michigan described a clinical trial approach that could help doctors develop treatment policies. These policies are akin to a flow chart for treating a person over time. Every junction in the chart requires reassessment of the treatment, and then a decision about whether to try something different. Using a method called Sequential Multiple Assignment Randomized Trials, or SMART (reviewed in Almirall et al., 2012), Murphy randomized each decision point to come up with treatment recommendations. “The idea is to construct a treatment policy that will tell you how to choose the action,” she explained.

For example, there are two main options to help children manage symptoms of attention deficit/hyperactivity disorder (ADHD): medication and behavior modification therapy. Murphy analyzed 138 children with ADHD. At the start of the school year, each was randomly assigned to receive either the stimulant Ritalin or the behavioral therapy. Every month the researchers asked teachers how the children were performing. Those doing well stayed on their current treatment; the others were randomized again—to either increase the intensity of the current therapy, or add the second option.

After analyzing the data, Murphy determined the plan that works best: If a child has been on medication before, start out with medication; otherwise, use behavioral therapy. If the treatment is unsuccessful, doctors must ask if the kid is adhering to the treatment plan. If so, it is best to intensify that therapy. Otherwise, better to try the alternate treatment. This last decision point, to add a second therapy, had the strongest evidence behind it. It is important to determine how confident one can be, statistically, about the decision points before implementing a treatment plan, Murphy said.

Big Data Get Personal
While many big datasets include thousands of patients, there is another variety: oodles of data, but all on one person. This could be useful not only for that person’s medical care, but also in daily life. For example, hackers at the conference explored a dataset of blood glucose measurements from a boy who wore a continuous monitor over three years. One finding data-waders already gleaned from the database was that the boy was not eating right at school.

Another example of personal megadata: In 2003, Microsoft developed a wearable camera that automatically takes photographs, two or three per minute, to help people with memory loss such as Alzheimer’s recall their activities. Regularly reviewing the photos jogs the memory. Looking at the photos was a better memory aid than keeping a written diary, researchers found (Berry et al., 2009; Browne et al., 2011).

Presenter Mary Czerwinwki profiled a newer project at Microsoft Research in Redmond, Washington. She is working on a software/sensor package that analyzes user’s moods to help them better understand why they feel the way they do. The first application she envisions for the system is to help people who are emotional eaters identify the trigger signs of a binge and avoid raiding the fridge. Czerwinski uses sensors, such as a heart rate monitor, built into a brassiere (for now, the project is for women only). Ten women tested the system for a week, and the researchers found the emotions that lead to eating are different for different people. Boredom, stress, or happiness could all precede snacking, so the app will have to be personalized for each eater.

As researchers meet the challenges of medical information processing, they hope that big data could make this kind of customization possible, even commonplace, in daily life, the clinic, and trials research.—Amber Dance.

This is Part 2 of a two-part story. See also Part 1.


No Available Comments

Make a Comment

To make a comment you must login or register.


News Citations

  1. Meeting Mulls Over Use of Complex Medical Data

Paper Citations

  1. . No small change for the health information economy. N Engl J Med. 2009 Mar 26;360(13):1278-81. PubMed.
  2. . Identifying pediatric age groups for influenza vaccination using a real-time regional surveillance system. Am J Epidemiol. 2005 Oct 1;162(7):686-93. PubMed.
  3. . ICU acuity: real-time models versus daily models. AMIA Annu Symp Proc. 2009;2009:260-4. PubMed.
  4. . Designing a pilot sequential multiple assignment randomized trial for developing an adaptive treatment strategy. Stat Med. 2012 Jul 30;31(17):1887-902. PubMed.
  5. . The neural basis of effective memory therapy in a patient with limbic encephalitis. J Neurol Neurosurg Psychiatry. 2009 Nov;80(11):1202-5. PubMed.
  6. . SenseCam improves memory for recent events and quality of life in a patient with memory retrieval difficulties. Memory. 2011 Oct;19(7):713-22. PubMed.

External Citations

  1. wearable camera

Further Reading


  1. . Development and validation of a spike detection and classification algorithm aimed at implementation on hardware devices. Comput Intell Neurosci. 2010;:659050. PubMed.
  2. . An exploratory case study of the impact of ambient biographical displays on identity in a patient with Alzheimer's disease. Neuropsychol Rehabil. 2008 Oct-Dec;18(5-6):742-65. PubMed.
  3. . Longitudinal histories as predictors of future diagnoses of domestic abuse: modelling study. BMJ. 2009;339:b3677. PubMed.
  4. . Combining patient-level and summary-level data for Alzheimer's disease modeling and simulation: a beta regression meta-analysis. J Pharmacokinet Pharmacodyn. 2012 Oct;39(5):479-98. PubMed.
  5. . Habit and recollection in healthy aging, mild cognitive impairment, and Alzheimer's disease. Neuropsychology. 2012 Jul;26(4):517-33. PubMed.