README Australian Defense Force Academy - Intrusion Detection Datasets Data hosted and ReadMe file provided by the University of Arizona Artificial Intelligence Lab. Citation information below. ADFA-IDS_2017.zip (18.9 GB unzipped, 1 GB zipped) contains: NOTE: n and X are used as repeating variables in file names to indicate a series of files ADFA-IDS_DATASETS (Network and Linux host IDS datasets) ADFA-LD Attack_Data_Master (n where n = 1 through 10 are file names) Adduser-n UAD-Adduser-n-X.txt (91 files) Hydra_FTP_n UAD-Hydra-FTP-n-X.txt (162 files) Hydra_SSH_n UAD-Hydra-SSH-n-X.txt (176 files) Java_Meterpreter_n UAD-Java-Meterpreter-n-X.txt (124 files) Meterpreter_n UAD-Meterpreter-n-X.txt (75 files) Web_Shell_n UAD-WSn-X.txt (118 files) Training_Data_Master UTD-n (where n = 0001 thorugh 0833) Validation_Data_Master UTD-n (where n = 0001 thorugh 4372) netflow_ids-label (X where X = 1 through 5) (25 files) weekX_fri.netflow_ids weekX_mon.netflow_ids weekX_thu.netflow_ids weekX_tue.netflow_ids weekX_wed.netflow_ids NGIDS-DS host logs n.csv (where n = 1 through 99) feature_descr.csv ground_truth.csv NGIDS.pcap ADFA-WD-SAA_Master (Windows based IDS dataset ADFA-WD) Full_Process_Traces S1 S1-n (n = 1 through 10) S1-n-Full_X.GHC (366 files) S2 S2-n (n = 1 through 10) S2-n-Full_X.GHC (212 files) S3 S3-n (n = 1 through 10) S3-n-Full_X.GHC (141 files) S4 S4-n (n = 1 through 10) S4-n-Full_X.GHC (143 files) Limted_Horizon_Process_Traces S1 S1-n (n = 1 through 10) S1-n-Full_X.GHC (18,238 files) S2 S2-n (n = 1 through 10) S2-n-Full_X.GHC (15,228 files) S3 S3-n (n = 1 through 10) S3-n-Full_X.GHC (6,858 files) S4 S4-n (n = 1 through 10) S4-n-Full_X.GHC (9,075 files) Raw_Data S1 S1-n.XML (n = 1 through 10) S2 S2-n.XML (n = 1 through 10) S3 S3-n.XML (n = 1 through 10) S4 S4-n.XML (n = 1 through 10) ADFA-IDS-DatabaseLicense.pdf How_to_use_ADFA-IDS_DATASETS.pdf readme-ADFA.txt DESCRIPTION The contents of this dataset are an update released March 27th, 2017 to the original ADFA-IDS dataset. ADFA IDS is an intrusion detection system dataset made publicly available in 2013, intended as representative of modern attack structure and methodology to replace the older datasets KDD and UNM. ADFA IDS includes independent datasets for Linux and Windows environments. ADFA-LD (Linux dataset) was generated on a Ubuntu Linux 11.04 host OS with Apache 2.2.17 running PHP 5.3.5. FTP, SSH, MySQL 14.14, and TikiWiki were started. The following show the payloads and vectors used to attack the Ubuntu OS and generate the dataset. PAYLOAD/EFFECT VECTOR password bruteforce FTP by Hydra password bruteforce SSH by Hydra add new superuser Client side poisoned executable Java based meterpreter Tiki Wiki vulnerability exploit Linux meterpreter payload Client side poisoned executable C100 Webshell PHP remote file inclusion vulnerability See G. Creech, "Developing a high-accuracy cross platform Host-Based Intrusion Detection System capable of reliably detecting zero-day attacks", 2014, Section 3.5.2 for detailed information on the methodology, collection, and organization of this dataset. ADFA-WD (Windows dataset) was genearted on a Windows XP Service Pack 2 host OS with the XP default firewall enabled for all attacks, and file sharing enabled, a network printer configured, wireless and Ethernet networking. Norton AV 2013 was used to scan certain payloads. FTP server, web server and management tool, and streaming audio digital radio package were activated. A target ratio of 1 : 10 : 1 =normal data:validation data:attack data was used to guide collection and structuring activities. Vectors: TCP ports, web based vectors, browser attacks, and malware attachments Effects: Bind shell, reverse shell, exploitation payload, remote operation, staging, system manipulation, privilege escalation, data exfiltration, and back-door insertion. See G. Creech, "Developing a high-accuracy cross platform Host-Based Intrusion Detection System capable of reliably detecting zero-day attacks", 2014, Chapter 4 for detailed information on the methodology, collection, and organization of this dataset. READ THE FOLLOWING FILE FOR MORE INFORMATION ON USING THESE DATA: How_to_use_ADF-IDS_DATASETS.pdf HOW TO CITE THIS DATASET Author(s): Gideon Creech and Jiankun Hu Title: ADFA IDS Dataset Publisher: AZSecure-data.org Location: [AZSecure-data has not yet implemented Digital Object Identifiers or Persistent URLs, please copy and paste the location where you retrieve this file from within http://www.azsecure-data.org/] Publication date: March 2017 IEEE formatted citation: G. Creech and J. Hu. ADFA IDS Dataset, University of Arizona Artificial Intelligence Lab, AZSecure-data, Director Hsinchun Chen. Available http://www.azsecure-data.org/ [November 2016] ALSO CITE the following related publications: [1] W. Haider, J. Hu, S. Slay, B. P. Turnbull and Y. Xie, “Generating Realistic Intrusion Detection System Dataset based on Fuzzy Qualitative Modeling,” Journal of Network and Computer Applications (JNCA), 2017, DOI: 10.1016/j.jnca.2017.03.018. [2] G. Creech, J. Hu, “A semantic approach to host-based intrusion detection systems using contiguous and discontiguous system call patterns,” IEEE Transactions on Computers, vol. 63, issue 4, April 2014, pp. 807-819. [3] G. Creech, and J. Hu, “Generation of a new IDS test dataset: Time to retire the KDD collection,” 2013 IEEE Wireless Communications and Networking Conference, WCNC 2013, pp.4487-4492. [4] M. Xie, and J. Hu, “Evaluating host-based anomaly detection systems: A preliminary analysis of ADFA-LD,” Proc. Of the 6th International Congress on Image and Signal Processing, CISP 2013, pp. 1711-1716. [5] M. Xie, J. Hu, X. Yu, and E. Chang, “Evaluating host-based anomaly detection systems: Applications of the frequency-based algorithms to ADFA-LD,” LNCS, vol. 8792, 2014, pp.542-549. [6] W. Haider, J. Hu, and M. Xie, “Towards reliable data feature retrieval and decision engine in hostbased anomaly detection systems,” Proc. Of the 2015 10th IEEE Conference on Industrial Electronics and Applications, ICIEA 2015, pp. 513-517. [7] W. Haider, G. Creech, Y. Xie, and J. Hu "Windows Based Data Sets for Evaluation of Robustness of Host Based Intrusion Detection Systems (IDS) to Zero-Day and Stealth Attacks." Future Internet 8.3 (2016): 29. [8] W. Haider, J. Hu, X. Yu, and Y. Xie, “Integer data zero-watermark assisted system calls abstraction and normalization for host based anomaly detection systems,” Proc. Of the 2nd IEEE International Conference on Cyber Security and Cloud Computing, 2016, pp. 349-355. [9] Q.A. Tran, F. Jiang, and J. Hu, “A real-time NetFlow-based intrusion detection system with improved BBNN and high-frequency field programmable gate arrays,” Proc. Of the 11th IEEE International Conference on trust, Security and Privacy in Computing and Communications, TurstCom 2012, 2012, pp. 201-208. Original data host and associated information: https://research.unsw.edu.au/sites/all/files/facultyadmin/adfa-ids-database_license-homepage.pdf