README Data hosted and ReadMe file provided by AZSecure Data and the University of Arizona Artificial Intelligence Lab. Citation information below. MalwareTrainingSets.zip file contains: trainingSets (189 folders, 6,215 files) scripts fromMongoToARFF.py README.md malwaretrainingsets-readme.txt DESCRIPTION This dataset was created in response to the overall lack of good malware datasets that exist for the purposes of supervised machine learning. The dataset compromises of labeled malware examples, in which each example corresponds to an instance of a specific malware, with the label being the malware name. A full description of the dataset is available at https://marcoramilli.blogspot.it/2016/12/malware-training-sets-machine-learning.html file types: JSON, ARFF (script provided for JSON --> ARFF conversion) Date range of data: Collection ended December 2016 Collection method: Both static and dynamic analysis of malware samples were used to generate the dataset. Each example comprises a separate JSON file, with the features and their values listed under properties. Each feature is represented as a concatenation of a category (such as 'signature') and an action (such as 'copies self'). The values are encoded as hashes (to help speed up training), and each hash represents an observation of the feature in the example. Malware execution in several sandboxes, MIST approach for feature extraction encoding. Topics covered or keywords used: Malware, MIST, static analysis, dynamic analysis, JSON, ARFF, supervised machine learning HOW TO CITE THIS DATASET Author(s): Marco Ramilli Title: Malware Training Sets Publisher: University of Arizona Artificial Intelligence Lab, AZSecure Data Location: Copy and paste the location where you retrieve this file from within http://www.azsecure-data.org/ Publication date: May 2018 IEEE formatted citation: M. Ramilli, Malware Training Sets, University of Arizona Artificial Intelligence Lab, AZSecure Data. Available http://www.azsecure-data.org/ [2018]