This article on "The Top 4 Observability Risks" raises some excellent points about throwing away data points.
Over the years I've learned that an important part of discarding or sampling data points is to work through:
1/ What do we gain by doing this? (Faster data processing, ease of analysis, something else.)
2/ What do we lose by doing this? (Interesting points, a well-rounded analysis, ...)
3/ How will we make note of items 1 and 2 when we publish the results? (Or make decisions, or feed the results into downstream processes, or ...)
and in the case of discarding data, we have:
4/ What's the source of this outlier? (Does it point to a flaw in our data collection? A special case that we must analyze separately? Something else?)
A lesson from commercial air travel
A brief note about complexity and knock-on effects
AI strategy: Asking the right questions
How to approach your company's AI plans