Abstract
ile validation is a technique to recognize and validate file formats in an arbitrary stream of data. File validation thus can be used to recover deleted files without relying on metadata of the file system, as file validation directly analyses an arbitrary stream of data. Furthermore, this technique can be used to recognize valid combinations of file fragments. Recovery and reconstruction of fragmented files is very challenging, however, file validation offers a potential path to success.In this thesis we investigate how file format specifications can guide file format valida-tion. We propose a method to determine whether file format validation is feasible and how this can be achieved using existing validation techniques.
To answer this question we approached this problem from a file format perspective, be-cause file format validation relies on properties of a file format. We analyzed popular file formats of commonly used file types to identify and generalize commonly used file format concepts across the different file format specifications. The analysis resulted in the identification of commonly used file format concepts.
Existing file validation techniques rely on properties of a file format. Our findings were that these properties can be translated into the identified file format concepts of our re-search, this resulted in the identification of a relation between file format concepts and file validation techniques.
A file validator is required to recognize and validate files. We identified a list of necessary validation principles to support these requirements. A validation principle can be imple-mented by using specific validation techniques, this dependency provides the linking pin between the file validator requirements and file format concepts.
This resulted in a method to determine the feasibility of file format validation. The method consists out of identifying the used file format concepts by analyzing a file format specification. Based on the identified file format concepts, the corresponding file valida-tion techniques are determined.
To verify the proposed method we apply the method on a complex file format. The PST file format is identified as a suitable candidate, because related work found out that PST files are frequently fragmented on a system. The PST file format is used for storing e-mails and calendar items of Outlook.
The conclusion of the method is that file format validation is feasible for the PST file format, because the file format contains sufficient file format concepts. We implemented a PST file validator using the suggested validation techniques provided by the method. The implemented PST validator was able to recognize file fragments and can be used to reconstruct file fragments into the original file.
Our proposed method can determine file format validation feasibility and identifies which validation techniques can be used in the implementation of a file validator. This means that a file format specification can provide guidance on the implementation of a file validator. We consider the proposed method a starting point, since it might not be complete. However, our method allows the addition of new validation techniques and file format concepts in case these are identified.
Date of Award | 13 Jul 2021 |
---|---|
Original language | English |
Supervisor | Hugo Jonker (Examiner), Bastiaan Heeren (Co-assessor) & Vincent van der Meer (External assessor) |
Master's Degree
- Master Software Engineering