This datasets, originally released by Dehouck et al. (2009), contains 2648 mutations in 132 proteins. Here we release an alternative cross-validation split were similar (as computed by BLASTP) proteins are clustered togheter and included in the same cross-validation subset. In this way, any possible bias derived from similaraty between training ans testing sequences is removed. We suggest to use this cross-validation split if you plan to evaluate your method on the S2648 dataset.
References