Before I give my own interpretation, please take a look at the following links:
- http://www.itmweb.com/essay550.htm: Edward V. Berard collected a host of definitions from various books and articles. In my reading, many of the definitions equate information hiding and encapsulation: "Data hiding is sometimes called encapsulation", "Encapsulation (also information hiding) ...", "[E]ncapsulation -- also known as information hiding ...", although Berard tries to convince us that there are significant differences. Still, I am confused.
- http://electrotek.wordpress.com/2009/04/29/encapsulation-and-information-hiding/ by Viktoras Agejevas has other citations; and ends with "These quotes clearly show that encapsulation and information hiding are almost synonymous." However, this is only a short discussion of the topic.
- http://nat.truemesh.com/archives/000498.html by Nat Pryce argues to distinguish encapsulation and information hiding because "I find it much easier to make good decisions when I am clear about when I am doing encapsulation and when I am doing information hiding." However, to me the text sounds just like an argument to prefer the term "information hiding."
- http://discuss.joelonsoftware.com/default.asp?design.4.145438.37 by Dave Jarvis also argues that the two are orthogonal. However, in the many comments, the concepts get muddled up more and more, also in Dave Jarvis's own comments -- although I must say he tries eagerly to find out a good usage of the terms.
Information hiding was introduced in a paper by Dave Parnas in 1972 -- long before object orientation came along. AFAIK, he used it to introduce a new way of modularizing an algorithm: The predominant method of the day was functional decomposition: An algorithm is taken apart into sub-algorithms.
Parnas argued that a "risk-driven approach" was better: Identify parts of the algorithm that rely on the same design decision, and package each such part into a separate "module." This module now need not expose the consequences of that decision, because all those consequences are now internal to the module. In other words, the module hides all information resulting from that decision. The huge advantage is that changes to that decision will not be seen by other parts of the system -- thus avoiding any ripple effect when the decision changes.
Thus, information hiding is a process concept that includes reasoning about project conditions and decisions. There is, in general, no "right" or "wrong" information hiding. Inherently, information hiding requires judgment about decision risks: If a decision will most certainly not change, there's no need to hide that information. Examples could be the selection of your operating or RDBM system; or "invariant" features that are very unlikely or even impossible to change, e.g. the list of human sexes or natural laws (... I fear that we agree that even such "invariants" may change in certain scenarios ...).
If information is hidden, it needs to be hidden somewhere: i.e., behind some sort of "walls." We could imagine a "software landscape" where there are only sometimes walls, but none at other places (imagine flying over English meadows -- sometimes with walls or hedges between them, sometimes not). However, it is easier to view the landscape as a set of "modules," with walls around each module. Now, it is important to realize that there are many different sorts of modules -- even in our current "OO age:"
- functions
- In OO languages:
- - classes (and their variants, like "structs")
- - class hierarchies
- - groups of nested classes
- threads and tasks
- thread groups / task groups
- co-routines in languages with co-routine support
- packages/namespace/"modules" (e.g. in MODULA)
- assemblies (in Windows and .Net)
- configuration files and file sets
- DSLs and generators
- aspects (in aspect-oriented programming languages)
- schemas (in relational databases)
- arbitrarily defined groups of classes, methods, threads, ...: E.g. all whose names match a certain pattern
In a nutshell: Information Hiding is a process encompassing ...
- ... deciding on things that might change / that are risky / that are not under your control: Changes to these should be hidden from the rest of the design of your system.
- ... situation-specific judgment about the volatility of decisions -- therefore, there is no "universally right or wrong" information to hide.
- ... identifying the "walls" towards other parts of the design behind whom information can be hidden -- in practice, the "modules" inside which certain pieces of information are to be hidden.
- Information hiding focuses on which information is hidden (away) so that changes to decisions about that information do not influence the rest of the system.
- Abstraction, on the other side, focuses on the information that is not hidden = that is exposed so that the rest of the system can rely on it.
From the standpoint of information hiding, in an ideal world, all information is hidden. Obviously, this completely prevents building the required system. Therefore, we must also focus on a useful design of the exposed (non-hidden) parts of each module. A good abstraction requires all those -ibilites, e.g.
- usability
- testability
- "understandability"
- completeness with respect to proofs or arguments about the users of the abstraction
- separate accessors from (implementation) data;
- expose only read-only data;
- expose only copies of internal data;
- separate APIs ("interfaces") of algorithms and algorithm groups from implementation details;
- use thread-local/thread-static variables to hide information inside a thread;
- Combining algorithms and data that are significantly coupled in "cohesive classes"
- Combining such elements in "inheritance hierarchies"
- For data exposed from some module, only expose restricted knowledge -- mostly only interfaces (a concept that emerged around the same time)
- Use private, public, internal, package-private to define the hiding boundaries.
- Law of Demeter
- Information Hiding and Abstraction are processes that decide about design based on "risk" information; and the results of these processes.
- Encapsulation are concrete techniques to establish information hiding and abstraction.