What types of data you are dealing with? We will try to roughly classify them and divide into the following five categories. Naturally, this is not a comprehensive classification, but it will help us to understand the options and approaches we have to keep in mind.
- Homogeneous data arrays containing elements of the same type
- Multimedia – audio, video and graphics files
- Temporary data for internal use (logs of various types, caches)
- Streams of calculated data of various types (e.g. recorded video stream or massive computation results)
- Documents (simple or compound)
The ways for storing such a data are as follows.
- Files in file system
- Structured storages
- Archives (as a specific form of structured storage)
- Remote (distributed, cloud) storages
Let us now discuss which storage mechanism will be the best suited for the types of data mentioned above.
Homogeneous data arrays
Homogeneous data arrays contain elements of the same type. Examples of a homogeneous data array may be a simple table, temperature data over time or last year stock values.
- For homogeneous data arrays, regular files do not provide possibility for convenient and fast search. You have to create, maintain and constantly update special indexing files. Modification of the data structure is almost impossible. Metainformation is limited. There is no built-in run-time compression or encryption of data.
- Relational databases are well suited for homogeneous data. They comprise a set of predefined records with rigid internal format. Main advantage of relational databases is an ability to locate data quickly according to specified criterion, as well as transactional support of data integrity. Their significant shortcoming is that relational databases will not work well for large-size data of variable length (BLOB fields are usually stored separately from the rest of the record). Moreover, keeping data in relational databases requires: a) use of specific DBMS, which limits severely portability of the data and of the application itself, b) pre-planning of database structure, including interrelational links and indexing policy, c) researching details of peak loads is required for efficient database development, which also may be a serious overhead.
- Structured storages are somewhat analogous to a file system, i.e. storages are a specific set of enveloped named streams (files). Such storage can be stored at any location, i.e. in a single file on a disk, in a database record, or even in RAM. The main advantage of this approach is that it allows efficient adding or deleting data in an existing storage, provides the effective manipulation of data of various sizes (from small to huge). The storages represent separate units (files) and therefore can be easily relocated, copied, duplicated, backed up. There is no need to track all files generated by an application. Moreover, journal keeping makes it possible to restore content completely or partially, thus eliminating accidents or failures. The disadvantage may be relatively slower search inside these huge data arrays.
- ZIP archives, as a specific form of the structured storage, can be used for storing homogenous data arrays, but only in case when the most of access is read-only. Standardized nature of ZIP format makes it easy to use, especially in cross-platform applications, but this format is not suitable for the data to be modified after packing, so adding and deleting of data is a time-consuming operation.
- Remote and distributed storages are the next level of storage in which actual data location and data access are provided by specific layer used for encapsulating of access mechanics. In such storages data can actually be stored in databases or be distributed among different file systems, but the actual storage organization does not matter for an end-user. The user observes only a set of objects accessed through an API, or, as a variant, through file system calls. Good example is cloud storages. These types of data storages are to be used in large software complexes. Among other advantages one can mention unified data access without a need to think about actual ways how data are stored. Its disadvantages – they cannot be efficiently managed and controlled, and backup or migration of data is complicated.
Audio, video and graphic files
Storing a single (or several) multimedia files is simple. Complexities appear when you need to maintain a large number of files and want to perform a search across the multimedia collection.
- Only very simple and sparse multimedia files can be stored as regular files. Even for an average home collection, simple file-based multimedia data storagedata hk becomes unmanageable very quickly. This is mostly due to size of these files, inability to handle any annotation, tags or metadata, and low speed of copying or relocation.
- Relational databases are a dubious way of storing audio, video or similar types of data. RDBMS are not well suited for keeping large BLOBs, especially when it comes to storing video files of big size. Also each type of data requires it’s own table (due to different sets of metadata that needs to be stored). On the other hand RDBMS can be handy as they offer powerful search capabilities, which is very suitable for read-only collections.
- Structured storages work perfectly well for storing of multimedia files when the storage supports metadata and fast search through them. If this search is not supported, structured storage becomes a variant of the file system.
- Remote and distributed storages are among the best solutions when it comes to storing of video, music or similar data. Storage represents a single unit where all elements of a multimedia or video game can be safely stored. There is no risk of losing a single but important file. Searches are fast and efficient if the storage supports tags and metadata.