MedTech Analytics faces a data complexity crisis
You're the Chief Data Scientist at MedTech Analytics, a company providing predictive models to 500+ hospitals nationwide. Your team just received data from a new electronic health record system with 2,000 features per patient.
The CEO storms in: "Our model is failing! It's overfitting terribly - 99% accuracy on training data but only 52% on new patients. Three major hospital chains are threatening to cancel their $30 million contracts!"
What's the main problem with having too many features?
Adjust the L1 penalty (Ī») and see features get eliminated:
L1 drives weak coefficients to exactly zero, performing automatic feature selection!