Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available dataset of dermatoscopic images. We collected dermatoscopic images from different populations, acquired and stored by different modalities. The final dataset consists of 10015 dermatoscopic images which can serve as a training set for academic machine learning purposes. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions: Actinic keratoses and intraepithelial carcinoma / Bowen's disease (akiec), basal cell carcinoma (bcc), benign keratosis-like lesions (solar lentigines / seborrheic keratoses and lichen-planus like keratoses, bkl), dermatofibroma (df), melanoma (mel), melanocytic nevi (nv) and vascular lesions (angiomas, angiokeratomas, pyogenic granulomas and hemorrhage, vasc).
# Prerequisites for Use
1. Download the training data [Here](https://isic-challenge-data.s3.amazonaws.com/2018/ISIC2018_Task3_Training_Input.zip)
2. Download the training ground truth [Here](https://isic-challenge-data.s3.amazonaws.com/2018/ISIC2018_Task3_Training_GroundTruth.zip)
3. Download the test data images [Here](https://isic-challenge-data.s3.amazonaws.com/2018/ISIC2018_Task3_Validation_Input.zip)
4. Download the test data ground truth [Here](https://isic-challenge-data.s3.amazonaws.com/2018/ISIC2018_Task3_Validation_GroundTruth.zip)
5. Unzip the files into the data folder in this repository.
6. Download required packages from requirements.txt, "pip install -r requirements.txt"
\ No newline at end of file
All of the data will need to be placed in directory:
`hvm-image-clf/data`
First we will need to download 3 data files used by the package. The first two can be downloaded by clicking the links below.
[Training Data](https://isic-challenge-data.s3.amazonaws.com/2018/ISIC2018_Task3_Training_Input.zip) - Unzip and place the folder named "ISIC2018_TASK3_Training_Input" and place in the data directory. This contains all of the training images.
[Ground Truth](https://isic-challenge-data.s3.amazonaws.com/2018/ISIC2018_Task3_Training_GroundTruth.zip) - Unzip and place the file named "ISIC2018_Task3_Training_GroundTruth.csv". This contains all of our ground truth labels of the training images.
The third dataset can be downloaded by following this link
[Metadata](https://dataverse.harvard.edu/file.xhtml?fileId=4338392&version=3.0). This will bring you to a website called Harvard Dataverse. On the page you will be able to see dropdown box called "Access File". Select the option called "Comma Separated Values (Original File Format) to download the dataset.
Project dependencies can be installed using
`pip install -r requirements.txt`
# Running the notebook.
If all of the pre-requisites are setup correctly, the notebook file can be run using an IPython or Anaconda distributed notebook ide.