Datasets

S. Aşık and U. Yayan. (2023). ChArIoT-Dataset. Accessed: May 7, 2023. [Online]. Available: https://tinyurl.com/chariot-dataset

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
+-------------+
| Data Folder |
+-------------+
The data within this folder can be used without any preprocessing.
These represent the final form of the dataset creation stages.
 
Datasets have been created for each mutation type, each containing a varying number of tokens.
5.1. Formatted
025 050 075 100 125 150 Unlimited
Update 1078 15211 45391 64732 72872 76284 81169
Delete 548 14837 53618 85713 101099 107817 115525
Insert 313 7044 22543 33625 38423 40483 42951
5.2. Unformatted
025 050 075 100 125 150 Unlimited
Update 2862 27166 56379 69961 752543 77641 81169
Delete 2016 29851 72981 96805 106450 110577 115525
Insert 1070 13150 29136 36897 39984 41240 42951
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
+----------------+
| Dataset Folder |
+----------------+
It is the folder where the compressed files of the dataset are located.
It contains files for all the steps performed during the creation of the dataset.
 
1. gitDiffs - It is the first stage of the dataset.
The committed source code files downloaded from GitHub repositories are located in this folder. 
There are approximately ~588k committed source code files. The committed source code includes both fixed and buggy source code.
The file naming is as follows: gitDiff_588016.txt
 
2. BugFixPairs - It is the second stage of the dataset.
It is the folder where the buggy and fixed source code files at the method-level are located.
There are approximately ~2.3M method-level source code files.
2.1. BuggyMethods - It is the folder where the buggy source code files at the method-level are located.
The file naming is as follows: BuggyMethod_2277408.txt
2.2. FixedMethods - It is the folder where the fixed source code files at the method-level are located.
The file naming is as follows: FixedMethod_2277408.txt
 
3. Actions - It is the third stage of the dataset.
The source code files at the method-level have been classified according to the type of mutation.
These mutation types are; Update, Delete, and Insert, totalling three types.
This folder contains the method-level source code files classified according to mutation type and the action nodes used in the classification.
3.1. Update - Mutation type at the method-level caused by an update.
There are approximately ~371k update mutation method-level source code files.
3.1.1. EditActions - It is the folder where the action nodes for the update mutation type are located.
The file naming is as follows: Update_EditActions_371378.txt
3.1.2. BugFixPairs - It is the folder where the buggy and fixed source code files at the method-level are located.
3.1.2.1. BuggyMethods - It is the folder where the buggy source code files at the method-level are located.
The file naming is as follows: Update_BuggyMethod_371378.txt
3.1.2.1. FixedMethods - It is the folder where the fixed source code files at the method-level are located.
The file naming is as follows: Update_FixedMethod_371378.txt
3.2. Delete - Mutation type at the method-level caused by an delete.
There are approximately ~454k delete mutation method-level source code files.
3.2.1. EditActions - It is the folder where the action nodes for the delete mutation type are located.
The file naming is as follows: Delete_EditActions_454084.txt
3.2.2. BugFixPairs - It is the folder where the buggy and fixed source code files at the method-level are located.
3.2.2.1. BuggyMethods - It is the folder where the buggy source code files at the method-level are located.
The file naming is as follows: Delete_BuggyMethod_454084.txt
3.2.2.1. FixedMethods - It is the folder where the fixed source code files at the method-level are located.
The file naming is as follows: Delete_FixedMethod_454084.txt
3.3. Insert - Mutation type at the method-level caused by an insert.
There are approximately ~243k insert mutation method-level source code files.
3.3.1. EditActions - It is the folder where the action nodes for the insert mutation type are located.
The file naming is as follows: Insert_EditActions_243580.txt
3.3.2. BugFixPairs - It is the folder where the buggy and fixed source code files at the method-level are located.
3.3.2.1. BuggyMethods - It is the folder where the buggy source code files at the method-level are located.
The file naming is as follows: Insert_BuggyMethod_243580.txt
3.3.2.1. FixedMethods - It is the folder where the fixed source code files at the method-level are located.
The file naming is as follows: Insert_FixedMethod_243580.txt
 
4. Abstractions - It is the last stage of the dataset.
It is the folder containing the abstracted fixed and buggy source code at the method-level, classified according to mutation type.
4.1. Update - Mutation type at the method-level caused by an update.
4.1.1. AbstractedBugFixPairs - It is the folder where the abstracted buggy and fixed source code files at the method-level are located.
4.1.1.1. AbstractedBuggyMethods - It is the folder where the abstracted buggy source code files at the method-level are located.
The file naming is as follows: Update_AbstractedBuggyMethod_81169.txt
4.1.1.2. AbstractedFixedMethods - It is the folder where the abstracted fixed source code files at the method-level are located.
The file naming is as follows: Update_AbstractedFixedMethod_81169.txt
4.1.2. BugFixPairs - It is the folder where the buggy and fixed source code files at the method-level are located.
4.1.2.1. BuggyMethods - It is the folder where the buggy source code files at the method-level are located.
The file naming is as follows: Update_BuggyMethod_81169.txt
4.1.2.2. FixedMethods - It is the folder where the fixed source code files at the method-level are located.
The file naming is as follows: Update_FixedMethod_81169.txt
It contains the abstracted buggy and fixed source code files according to the Update mutation type.
These source code files are written inline. 'Formatted' means three special tokens; INDENT, DEDENT, and NEWLINE were added to the source code.
Using the data in this file for prediction makes it possible for the predicted code to be syntactically reverted.
'Unformatted' is used in predicting the source code inline. The predicted source code cannot be syntactically reverted.
Update_Formatted_Dataset_AbstractedBuggyMethods.txt (81169)
Update_Formatted_Dataset_AbstractedFixedMethods.txt (81169)
Update_Unformatted_Dataset_AbstractedBuggyMethods.txt (81169)
Update_Unformatted_Dataset_AbstractedFixedMethods.txt (81169)
4.2. Delete - Mutation type at the method-level caused by an delete.
4.2.1. AbstractedBugFixPairs - It is the folder where the abstracted buggy and fixed source code files at the method-level are located.
4.2.1.1. AbstractedBuggyMethods - It is the folder where the abstracted buggy source code files at the method-level are located.
The file naming is as follows: Delete_AbstractedBuggyMethod_115525.txt
4.2.1.2. AbstractedFixedMethods - It is the folder where the abstracted fixed source code files at the method-level are located.
The file naming is as follows: Delete_AbstractedFixedMethod_115525.txt
4.2.2. BugFixPairs - It is the folder where the buggy and fixed source code files at the method-level are located.
4.2.2.1. BuggyMethods - It is the folder where the buggy source code files at the method-level are located.
The file naming is as follows: Delete_BuggyMethod_115525.txt
4.2.2.2. FixedMethods - It is the folder where the fixed source code files at the method-level are located.
The file naming is as follows: Delete_FixedMethod_115525.txt
It contains the abstracted buggy and fixed source code files according to the Delete mutation type.
These source code files are written inline. 'Formatted' means three special tokens; INDENT, DEDENT, and NEWLINE were added to the source code.
Using the data in this file for prediction makes it possible for the predicted code to be syntactically reverted.
'Unformatted' is used in predicting the source code inline. The predicted source code cannot be syntactically reverted.
Delete_Formatted_Dataset_AbstractedBuggyMethods.txt (115525)
Delete_Formatted_Dataset_AbstractedFixedMethods.txt (115525)
Delete_Unformatted_Dataset_AbstractedBuggyMethods.txt (115525)
Delete_Unformatted_Dataset_AbstractedFixedMethods.txt (115525)
4.3. Insert - Mutation type at the method-level caused by an insert.
4.3.1. AbstractedBugFixPairs - It is the folder where the abstracted buggy and fixed source code files at the method-level are located.
4.3.1.1. AbstractedBuggyMethods - It is the folder where the abstracted buggy source code files at the method-level are located
The file naming is as follows: Insert_AbstractedBuggyMethod_42951.txt
4.3.1.2. AbstractedFixedMethods - It is the folder where the abstracted fixed source code files at the method-level are located
The file naming is as follows: Insert_AbstractedFixedMethod_42951.txt
4.3.2. BugFixPairs - It is the folder where the buggy and fixed source code files at the method-level are located.
4.3.2.1. BuggyMethods - It is the folder where the buggy source code files at the method-level are located.
The file naming is as follows: Insert_BuggyMethod_42951.txt
4.3.2.2. FixedMethods - It is the folder where the fixed source code files at the method-level are located.
The file naming is as follows: Insert_FixedMethod_42951.txt
It contains the abstracted buggy and fixed source code files according to the Insert mutation type.
These source code files are written inline. 'Formatted' means three special tokens; INDENT, DEDENT, and NEWLINE were added to the source code.
Using the data in this file for prediction makes it possible for the predicted code to be syntactically reverted.
'Unformatted' is used in predicting the source code inline. The predicted source code cannot be syntactically reverted.
Insert_Formatted_Dataset_AbstractedBuggyMethods.txt (42951)
Insert_Formatted_Dataset_AbstractedFixedMethods.txt (42951)
Insert_Unformatted_Dataset_AbstractedBuggyMethods.txt (42951)
Insert_Unformatted_Dataset_AbstractedFixedMethods.txt (42951)
 
 
It is the file containing the most frequently used idioms in the dataset.
These idioms have been used to prevent the most frequently used identifiers or constant literals in the abstraction of the source code from being replaced with a special token.
Idioms.txt (414)
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
+---------------+
| Sample Folder |
+---------------+
This folder contains samples of compressed files within the 'Dataset Folder'.
It has been prepared to provide information about the file contents without the need to download the compressed files.
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

- Faulty Image Dataset 

https://arxiv.org/abs/2108.13803

https://drive.google.com/drive/folders/1pIqEnIIKFN13Z4m38zEB-XYQp_qb0T6S?usp=sharing.

The faulty image dataset is an image library that contains samples of faulty images that a robot camera has recorded as a result of various faults it may encounter. Possible camera malfunctions, such as the camera recording the faulty images at that time and the system continuing to work on these faulty images, can cause serious problems in the system. CamFITool is designed to simulate such failures, collect faulty picture outputs of the systems as a result of these failures, and create faulty output source for artificial intelligence systems to be developed to detect these problems. CamFITool allows us to create a dataset of 5000 normal and 5000 fault injected images (see Table 1) from the SRVT system. In addition, users can create such datasets in desired configurations and camera-based perception-based environments by using CamFITool.

RGB Camera Fault Types: 

Dilation: There are 476 of these faults injected pictures in the dataset.

Erosion: There are 637 of these faults injected images in the dataset

Open: There are 640 of these faults injected images in the dataset.

Close: There are 841 of these faults injected images in the dataset.

Gradient: There are 632 of these faults injected images in the dataset.

Motion-blur: There are 687 of these faults injected images in the dataset.

Partialloss: There are 1087 of these faults injected pictures in the dataset.