Recently the recommendations for breast cancer, cervical cancer and prostate cancer screenings were updated. Changes have been made for the age at first screening, the target groups for screening and the frequency of screenings. I am not bothered by this.
Interestingly, when my statistics professor finished his lecture on discriminate analyses and "priors" this week, he related the models to health screenings. Discrimination analysis itself involves separating groups or predicting group status.
Let me explain. I promise it won't be painful. Statistics is MY favorite subject.
Statisticians can use information about people (data) to predict an outcome about them, including what "group" they are likely to be in, or a career that they might do well in. For example, if the statistician knows the scores that 20 people earned on a reading test, a math test and a personality test, he can create a model that predicts success in a high school class. In the model, certain scores may matter more than others, so they are weighted. For instance, the math test has more to do with success in the class than reading score, but both matter. The statistician puts the numbers in the model and predicts which of the 20 people will do well. Maybe his model is 70% accurate. He is only wrong 30 percent of the time. Then he adds a "prior." A prior is something we already know about the outcome. In this case, men do better in the class regardless of scores. So he adds gender to the model. Now his success rate is 80%. Perhaps he finds another prior and his model becomes 95% accurate. That is wonderful! Just think if he were predicting which job applicant would make the most money for a company. The employer could use his model and find the best employee to hire 95% of the time!
My professor said that this type of modeling is very useful UNLESS the thing you are trying to predict does not happen very often. Maybe the person with combined scores that indicate a good employee is very hard to find. The model predicts correctly, so most of the time it is just saying that no one in the group will do well. Remember in our scenario, the model is wrong 5% of the time. It may miss someone that would do well, or choose someone who really doesn't do well. That happens five out of 100 times. But maybe it takes 1000 times to find ANYONE.
If it doesn't cost anything to put all the information in a computer and run it every week or so, its not that big of a deal. We might not find a successful person for a while, but it is not a matter of life or death!
Something else to consider:
When something is rare, the chance of having a false positive is greater than having a true positive. The model says someone will do well, when they will not. Improving the model can increase the number of false negatives, too. The model says the persons will not do well, but they really would have.
If it costs hundreds of dollars to run ONE person through the model and you only find a match on the 10,000 try... it is an accurate and useful model, it just doesn't make sense to use it all the time.
Believe it or not, breast, cervical and prostate cancer are rare. Testing every person, every year, is not cost effective because we don't find many cases and we find more false cases than true cases. False cases lead to unnecessary follow up tests and lots of stress.
Of course, the cost of missing a true cancer can be a matter of life or death. That is the false negative case.
We have to consider how often that really happens? How many times do we say that someone does not have cancer when they really do - in 10,000 tests?
I do not know the answer. I believe that over time we have learned that we are finding more "not real" cases because the cancers are so rare.
I think that having a screening every 2-3 years instead of every year, is better than not having the screenings at all. Its an ok compromise for me.
BTW - thinking of weighting cases and cancer. We know that certain things add to our risk of cancer, e.g., poor diet, overweight, lack of exercise, chemicals, cigarette smoking... guess which one has the greatest "weight" in our prediction model - SMOKING.
and NO, I did not keep that simple. Sorry.
No comments:
Post a Comment