Introduction to File Processing Systems
A File Processing System is a traditional method of storing and managing data where data is stored in individual, separate files, often called "flat files" (like .txt, .csv, or custom binary files). Each application program is written to directly manipulate these files, defining its own data structures and routines for reading, writing, and updating.
While simple for small, single-user tasks, this approach quickly breaks down as data volume and application complexity grow.
Key Disadvantages of File Processing Systems
Here are the major drawbacks, with examples and comparisons to how a DBMS handles them.
1. Data Redundancy and Inconsistency
- Description: The same piece of information may be duplicated in multiple files. For example, a university might store a student's name and address in a
Registrar.dat
file and also in a Library.dat
file.
- Consequences:
- Wasted Storage: Storing the same data multiple times consumes unnecessary disk space.
- Data Inconsistency: If a student moves and updates their address, the change might be made in the
Registrar.dat
file but not in the Library.dat
file. Now, the system contains conflicting information, leading to a loss of data integrity.
- How a DBMS Solves This: A DBMS uses normalization to reduce redundancy. Data is stored in one central location, and other parts of the database can reference it. A student's address would be stored only once in a
Students
table.
2. Difficulty in Accessing Data
- Description: To retrieve a specific piece of data, a custom program must be written. If a new type of query is needed, a new program has to be developed. There is no standard, easy-to-use language for ad-hoc data retrieval.
- Consequences: Getting answers to simple questions can be a slow and expensive process requiring a programmer's time. For example, a request like "Find all students from California with a GPA above 3.5" would require a new program to be written to parse the student file.
- How a DBMS Solves This: A DBMS provides a powerful query language (like SQL) that allows users to ask complex questions of the data without writing a full application. The above request becomes a simple one-line SQL query.
3. Data Isolation
- Description: Data is scattered across various files, and the files may be in different formats. It is difficult to write new applications that retrieve data from multiple, unrelated files.
- Consequences: It is a complex programming task to combine data for a holistic view. For instance, creating a report that links a student's course registrations with their library fines and financial aid status would be a major undertaking.
- How a DBMS Solves This: A DBMS provides a unified, logical view of the data. Data from different tables (like
Students
, Courses
, and LibraryFines
) can be easily joined and retrieved in a single query.
4. Integrity Problems
- Description: Data integrity refers to the accuracy and consistency of data. In a file system, integrity rules (like "Student GPA must be between 0.0 and 4.0" or "Account balance cannot be negative") must be enforced by the application programs.
- Consequences: This logic is duplicated across every program that accesses the data. It's easy to forget to add these checks in a new program, or they might be implemented inconsistently, leading to invalid data entering the files.
- How a DBMS Solves This: A DBMS allows integrity constraints to be defined on the data itself (e.g.,
CHECK (GPA >= 0.0 AND GPA <= 4.0)
). The DBMS then ensures that these rules are never violated, regardless of which application is accessing the data.
5. Atomicity Issues
- Description: Many operations are "all-or-nothing" events, also known as atomic transactions. For example, a fund transfer involves two steps: debiting one account and crediting another.
- Consequences: In a file system, if the system crashes after the debit operation but before the credit operation, the money is lost, and the data is left in an inconsistent state. There is no built-in mechanism to handle this.
- How a DBMS Solves This: A DBMS provides transaction management. It guarantees the atomicity of transactions. If any part of a transaction fails, the entire transaction is rolled back, and the database is restored to its original, consistent state.
6. Concurrent Access Anomalies
- Description: When multiple users are allowed to access and update the data simultaneously, it can lead to chaos. This is a common requirement in most modern systems.
- Consequences: For example, if two travel agents are trying to book the last seat on a flight, one agent's update might overwrite the other's, leading to the seat being double-booked (the "lost update" problem). File systems offer very crude locking mechanisms (e.g., locking an entire file), which is highly inefficient.
- How a DBMS Solves This: A DBMS has sophisticated concurrency control mechanisms (like locking and timestamping) that manage simultaneous access, ensuring that the database remains consistent even with many users making updates at the same time.
7. Security Problems
- Description: Enforcing security policies is difficult. For example, how do you allow a user to see a student's name and major but not their GPA?
- Consequences: In a file system, access control is often limited to the entire file. A user either has permission to read the whole file or they don't. There's no way to implement fine-grained security.
- How a DBMS Solves This: A DBMS provides a robust security and authorization subsystem. It allows for the creation of different user roles and the granting of specific privileges (e.g.,
SELECT
, INSERT
, UPDATE
) on specific tables or even specific columns.
Summary Table
| Disadvantage | Description in File Processing System | How a DBMS Solves It |
| :--- | :--- | :--- |
| Data Redundancy | Same data is stored in multiple files, wasting space. | Normalization and a centralized data repository. |
| Data Inconsistency | Data becomes conflicting across different files after updates. | Centralized data ensures updates are reflected everywhere. |
| Difficult Access | Requires custom programs for every new data retrieval task. | Provides a standard Query Language (SQL) for easy access. |
| Data Isolation | Data is scattered in separate files with different formats. | Provides a unified, logical view of all data. |
| Integrity Problems | Business rules must be coded into every application. | Enforces Integrity Constraints at the database level. |
| Atomicity Issues | Incomplete operations (e.g., due to a crash) leave data corrupt. | Guarantees Atomic Transactions (all-or-nothing). |
| Concurrency Anomalies | Simultaneous updates can corrupt data or be lost. | Sophisticated Concurrency Control mechanisms. |
| Security Problems | Poor, coarse-grained access control (often at the file level). | Granular Security and Authorization controls. |
In conclusion, while file processing systems are simple, they are fundamentally unsuited for applications that require data consistency, security, reliability, and efficient access for multiple users or complex queries. The Database Management System was created specifically to solve these profound disadvantages.