The training set for NetDiseaseSNP is made publicly available below with the exception of information from HGMD Professional which we are not allowed to share for licensing reasons. We appreciate this poses a limitation in others obtaining a full data set, however the additional curated information in HGMD Professional warrants its use in this work, and academic pricing for this product is known to be reasonable if other computational groups want to obtain the data. Data that we do make available originates from UniProt.

The variant data file contains lines of tab separated fields of the format: 'Accession of sequence', 'native amino acid', 'position', 'variant amino acid', 'target value: Neutral=0/Disease=1', 'Origin of data':

Variant data

The fasta sequences corresponding to the variant data can be found via the link below:

Sequence data


