It is common practice to strive towards obtaining as much training data as possible. However, large datasets
increase management and model training costs.
Not all data examples are equally important. Given a large training dataset, what is the smallest subset you
can sample that still achieves some threshold of performance?