This paper is from Jeremy's group in Harvard.
Comparative-effectiveness research is discussed, which is "using secondary health-care data, including EMRs, longitudinal claims data, and regstries" to study the outcomes of medicines in the normal routine of medical practice without intervening in the delivery of the care that produces these outcomes. (Why useful? see Figure 1)
Li describes the brute force method for matching participants in 4 groups on page 35. It will be of note to examine the C code to see how he performs this matching. Their suboptimal matching also falls prey to the 7th son example, in that matchings are optimal for points in group 1, but not across the other groups.
Upon further searching, I can find neither their R package nor C code.
Research Question: The biggest hurdle to the Lu group's research seems to be this suboptimal matching. That is, they can't do an optimal matching across more than two groups. Therefore, they do an optimal bipartite matching or an optimal pair-matching between group 1 and the other k-1 groups individually, then match in on group 1. Each of these approaches are considered suboptimal; for example, the latter does not guarantee best matches between groups 2 and 3. Could we create an optimal matching across multiple groups? To test this, it would be advantageous to get an actual dataset.
This is likely the best and most comprehensive and easy to follow discussion on propensity scores that I have seen yet. Multiple examples are considered.
Starting on page 12, the author goes into a deeper discussion on matching, including nearest neighbor matching. Page 20 provides the best evidence for matching (and a description of it) that I've seen, comparing both un-matched and matched controls to the "treatment" (drug use) group.
Most matchings consider a larger control group than a treated group. Gu and Rosenbaum (1993: 413) note that optimal algorithms and greedy (go through the treatment group G1 only once and assign best matches) algorithms pick roughly the same controls, but may not assign them to the best matches between the two groups.
Research Question: Our matching algorithm, how would it perform under a greedy assumption vs the current "smallest first" nearest-neighbors approach? Again, what would an optimal (smallest total sum of distances) approach look like? What about increasing the size of the control group (assume Gk) to 2n participants and matching to the best n? How about allowing multiple controls/matches per treated---how will that affect our outcome? [These questions may need actual data. Perhaps use Stuart's data from this talk, if available, to compare?]
- Existing packages for matching [twang (McCaffrey), Matching (Sekhon), MatchIt (Ho)]
- Paper References: (Smith 1997, Rubin and Thomas 2000)
- Multilevel settings (see slide 155, p38)
- MatchIt R package for matching (http://gking.harvard.edu/matchit)
- Stuart's website: www.biostat.jhsph.edu/~estuart
Lu presents an example of and uses for Propensity scores. Full PSM model on slides 47-48.
Slide 28 describes the need for pair matching, as we expanded to k-way matching for our kd tree paper.
Research Question: can we extend the kd-tree algorithm to an optimal matching approach (minimizing the sum of the distances)? According to Lu, this approach provides a better matching than the nearest-neighbor approach (see slides 33-39).
So, since I am scouring for ideas and reading many interesting papers, I figured I needed a place to collect this data.
Why not a Word document, etc? Well, I wanted a place to update and add summaries from all my devices and to easily access them from any other device. This seemed like the most appropriate method of collecting, sorting, and sharing these ideas.