The skewed distribuition of health expenditures with a large number of 0 observations poses difficulties. A recent article in Annual Review of Public Health explains the details and the right approach, in my opinion.
We compare estimation and interpretation of the effect of a change in insurance policy on health care expenditures using OLS and a two-part model. The two-part model is based on a statistical decomposition of the density of the outcome into a process that generates zeros and a process that generates positive values. A logit or probit model typically estimates the parameters that determine the threshold between zero and nonzero values of the outcome. In general, alternative specifications of the binary choice model (the first part) yield nearly identical results. However, the choice of model for the distribution of the outcome conditional on it being positive (the second part) is critically important. Different models can yield quite different results.We use a generalizedThe use of two parts models, and GLM is the standard approach to take into account. The book Health econometrics using Stata is the key reference.
linear model to estimate the parameters that determine positive values. Generalized linear models accommodate skewness in natural ways, give the researcher considerable modeling flexibility, and fit health care expenditures extremely well.