Encapsulation and Information Hiding

Kathi Fisler

So far, we’ve focused on how to create classes that are amenable to future extensions. Today, we look at making code robust, both in how to enable future modifications (as well as extensions) and in protecting against malicious or unintentional programming errors.

1 Code Critique

For this lecture, start with this starter file for a banking service. Critique it: what problems do you see in this code with regards to future modifications or information protection?

Any class that has access to a customer object has the ability to access or change that customer’s password. In the BankingService class, for example, the login method directly accesses the password to check whether it is valid; that method could just as easily (maliciously!) change the password. The contents of the password should never get out of the Customer class.
The real problem here is that login should be a method on Customer, which has the data that the method needs.
A similar concern applies to the balance field in withdraw, but withdraw illustrates another problem. Imagine that the bank adds more details to accounts (such as overdraft protection or a withdrawal fee). The BankingService class would have to keep changing as the notion of Accounts changes, which makes no sense. The BankingService class simply wants a way to execute a withdrawal without concern for the detailed structure of Account objects. The withdraw method needs to be a method on Account, not BankingService.
The BankingService class has written all of its code over accounts and customers against a fixed data structure (the LinkedList). The dependency is clear in the code for the methods (getBalance, withdraw, and login): each includes a for-loop over the list in its implementation.
The dummy return value of 0 in getBalance and withdraw is awful, because it does not distinguish between a valid answer (an account balance of 0) and an error condition. Picking a dummy value to satisfy the type system is never a good idea. This program needs a better way of handing errors.

Underlying the first three of these concerns is a goal called encapsulation. Intuitively, encapsulation is about bundling data and code together in order to (1) reduce dependencies of one part of a system on structural details of another, and (2) control manipulation of and access to data. This lecture is about recognizing where encapsulation is needed and learning how to introduce it into your program. The next lecture will address error handling (item 4).

2 Encapsulating Knowledge

Problems 1 and 2 are fundamentally failures to keep data and methods on them in the same class. Here, encapsulation is about regulating access to data (for purposes of reading, modifying, or even knowing about the existence of some data). These problems illustrate why we want to encapsulate knowledge in programs.

We will fix these problems in two stages: first, we move each method into its proper class (and rewrite the BankingService to use the new methods; second, we protect the data within these classes from unauthorized access.

2.1 Putting Methods in Their Proper Place

Let’s move the withdraw and getBalance methods into the Account class:

  class Account {
    int number;
    Customer owner;
    double balance;

    // returns the balance in this account
    double getBalance() {
      return this.balance;
    }

    // deducts given amount from account and returns total deduction
    // if add account info, no need to edit BankingService
    double withdraw(double amt) {
      this.balance = this.balance - amt;
      return amt;
    }
  }

Methods like getBalance, which simply return the value of fields, are called getters. Many OO books suggest adding getters (and a corresponding setter method to change the value) on all fields. This guideline is too extreme though—we’ll return to it at the end of the lecture.

The getBalance and withdraw methods in the BankingService class change as follows to use the new methods. Note that neither one now directly accesses the field containing the data in Account.

  double getBalance(int forAcctNum) {
    for (Account acct:accounts) {
      if (acct.number == forAcctNum)
        return acct.getBalance();
    }
    return 0;
  }

  double withdraw(int forAcctNum, double amt) {
    for (Account acct:accounts) {
      if (acct.number == forAcctNum) {
        return acct.withdraw(amt);
      }
    }
    return 0;
  }

One advantage to having the separate withdraw method in the Account class is that if the data in an account changes, we can change the withdrawal computation without affecting other classes. For example, if the bank introduced withdrawal fees, then the amount deducted from an account would be the amount requested plus the fee. This new code structure lets the BankingService simply ask to perform the withdrawal, leaving the specifics to the Account class.

Next, let’s move login into the Customer class. The result is similar.

  class Customer {
    String name;
    int password;
    LinkedList<Account> accounts;

    // check whether the given password matches the one for this user
    // in a real system, this method would return some object with
    // info about the customer, not just a string
    String tryLogin(int withPwd) {
      if (this.password == withPwd)
        return "Welcome";
      else
        return "Try Again";
    }
  }

  class BankingService {
    ...
    String login(String custname, int withPwd) {
      for (Customer cust:customers) {
        if (cust.name == custname) {
          cust.tryLogin(withPwd);
        }
      }
      return "Oops -- don't know this customer";
    }
  }

In similar spirit, you should replace the direct references to acct.num and cust.name in the BankingService class with calls to getters for those fields.

2.2 Access Modifiers in Java

Even though we have edited the BankingService to not directly access a customer’s password or the balance in an account, nothing we have done prevents the BankingService (or a future extension of it) from doing so. To make this program more robust, we want to protect the data in the Customer and Account classes from direct access or modification from outside classes. Other classes may be able to access or modify these through getters, setters, or other methods, but at least then the programmer providing those methods has some control over how the access occurs. The question, then, is how to prevent direct access to the fields of a class using an "object.field" expression.

Java provides several access modifiers that programmers can put on classes and methods to control which other classes may use them. The modifiers we will consider in this course are:

private means the item is only accessible by name inside the class. If you make a field private, for example, then if you wanted an object from another class to access the field, you would need to provide a method (like a getter) that enables the access.
public means every other class or object can access this item.
protected means that objects in the current class and all of its subclasses (and their subclasses) can access this item.

There are some additional ones that you can use when organizing Java code into larger units called packages; you’ll get to those if you take Software Engineering.

For our banking application, we want to make all of the fields in all of the classes private. This is a good general rule of thumb, unless you have a good reason to do otherwise. In addition, you should mark methods meant to be used by other classes as public. Concretely, the Customer and Account classes now look like:

  class Customer {
    private String name;
    private int password;
    private LinkedList<Account> accounts;

    public String tryLogin(int withPwd) {
      ...
    }
  }

  class Account {
    private int number;
    private Customer owner;
    private double balance;

    public double getBalance() {
      ...
    }

    public double withdraw(double amt) {
      ...
    }
  }

Access modifiers are checked at compile time. Try accessing cust.password in the login method in BankingService with the access modifiers in place to see the error message that you would get.

Now that we’ve seen access modifiers, we can explain why Java requires that methods that implement interfaces are marked public. The whole idea of an interface is that it is a guaranteed collection of methods on an object. The concept would be meaningless if the methods required by an interface were not public. The fact that you get a compiler error without the public, though, suggests that public is not the default modifier. That is correct. The default modifier is "public within the package" (where package is the concept you will see in SoftEng for bundling classes into larger units). This is more restrictive than pure public, so the public annotation is required on all methods that implement parts of interfaces.

2.2.1 Guidelines on Access Modifiers

Good programming practice recommends the following guidelines:

Put access modifiers on every field and method (including constructors) in a class.
Make all fields private unless another guideline applies.
Provide public getters to give access to field values that you want accessible. This lets another class read, but not write, the field value.
Any method that is visible through an interface must be public.
Any method that another class must use should be public.
Any field/method whose visibility can be limited to subclasses should be marked protected.
Make constructors in abstract classes protected, so subclasses can invoke them.
Make constructors that can’t be used by other classes private.

Note that subclasses cannot make something less visible than in their superclass. So if a class declares a field as public, you cannot extend the class and have the field be private in the extended class. The reasons for this have to do with the inconsistency of access information given that Java can view an object as either a member of its class or any superclass of its class.

3 Encapsulating Representation

Now we return to the third problem we cited in our critique of the original code: the BankingService class fixes the assumption that accounts and customers should be stored as linked lists. When we looked at data structures, we talked about using interfaces to allow programmers to switch from one data structure to another without breaking code. Here is canonical example of code that does NOT enable this. If the bank grows to lots of customers and wants to switch to using an AVL tree, for example, it cannot do so easily because the methods in BankingService have been written specifically for LinkedLists (due to the use of the for-loops). Code designed for long-term evolution and maintenance (in other words, most code in a production environment) should NOT do this.

To fix this, we need to rewrite the code to remove both the for-loops and the specific references to LinkedList. But how? This involves several steps, described over the next several subsections.

3.1 Replace Fixed Data Structures with Interfaces

In general, here is how to factor a fixed data structure out of existing code:

Find all variables whose type you want to generalize.
Introduce interfaces for the types of these variables (some variables may be able to share the same types).
For each place in the current code that relies on the current type of the variable, ask yourself what that code is trying to compute (i.e., figure out a purpose statement for it). Invent a method name for that computation, add it to the interface, and replace the existing code with a call to the new method.

To make this clearer, let’s apply these steps to our BankingService program.

Which variables to we want to generalize?: Each of accounts and customers.
Choose interface names for the variables: Each of these is representing a set, so IAccountSet and ICustSet are reasonable choices.
  interface IAccountSet {}
  interface ICustSet {}

  class BankingService {
    private IAccountSet accounts;
    private ICustSet customers;

    ...
  }
For each place in the current code that relies on the current type of the variable, ask yourself what that code is trying to compute: Let’s take the original getBalance code as an example.
  double getBalance(int forAcctNum) {
    for (Account acct:accounts) {
      if (acct.number == forAcctNum)
        return acct.getBalance();
    }
    return 0;
  }
The for-loop here locates the account with the given number, then gets the balance from that account. The general purpose of the for-loop, then, is to find an account by its number. This suggests the following method on IAccountSet:
  interface IAccountSet {
    // returns the account whose number matches the given number
    Account findByNumber(int givenNum);
  }
Replace current code on a specific data type with calls to methods in the new, general, interface: Now, we rewrite getBalance to use findByNumber.
  double getBalance(int forAcctNum) {
    Account acct = findByNumber(forAcctNum);
    return acct.getBalance();
  }
Note that we have not yet addressed what happens if there is no account with the given number in the list. We will return to that in the next lecture.

Follow similar steps to generalize the withdraw and login methods. We leave these as an exercise so you can practice.

3.2 Create Concrete Classes that Implement the New Interfaces

Now that we have rewritten BankingService to use IAccountSet and ICustSet, we need classes that implement these interfaces. Our original code provides an initial implementation using LinkedList.

  class AcctSetList implements IAccountSet {
    LinkedList<Account> accounts;

    public Account findByNumber(int givenNum) {
      for (Account acct:accounts) {
        if (acct.getNumber() == givenNum)
          return acct;
      }
      return null;  //not good -- will fix in next lecture
    }
  }

With the generalized findByNumber method, it isn’t clear what to use as the return type if no account has the given number: different methods that call this search method might need different default answers. For now, we will use the very wrong approach of returning null, just so we can get the code to compile. We will discuss how to do this properly in the next lecture.

3.3 Initialize Data with Objects of the New Concrete Class

We have generalized BankingService and made new classes for the data structures we need. One step remains: we have to tell the BankingService to use our concrete classes. Where should this happen?

It should not happen within BankingService itself. The whole point of encapsulation is that BankingService shouldn’t know which specific data structures it is using. The only other way to get a specific object into a BankingService object is through the constructor. This is the answer: when you create a BankingService object, pass it objects of the specific data structure that you want to use.

  class Examples {
    BankingService B = new BankingService(new AcctSetList(),
                                          new CustSetList());
    ...
  }

This illustrates how we create different banking services with different data structures for accounts and customers. If we had an AVL-tree based implementation of IAccountSet as a class named AcctSetAVL, we could create a different banking service using:

BankingService B = new BankingService(new AcctSetAVL(),
new CustSetList());

Since BankingService only uses methods in the IAccountSet and ICustSet interfaces, we can freely chose a data structure without editing the code within the BankingService class (which was our goal).

3.4 Anticipated Questions

This banking service has no customers or accounts. How do we populate those?
In a full banking service program, the IAccountSet and ICustSet interfaces would also need methods for adding new elements (similar to those in the normal ISet interface). These notes do not include these in order to stay focused on the topic at hand.
Why did we create IAccountSet and ICustSet instead of just reusing the ISet interface from earlier in the term?
The ISet interface gives some of the methods we want (like addElt), but not all of them. It would not, for example, provide methods like findByNumber that are specific to sets of accounts. We should, however, be able to extend a good existing ISet implementation.
If I have to specify the data structure to use in the constructor, how would I switch to a new data structure after my banking service had been running for a while?
These notes have not addressed this question. To do this, you would first need a method to convert your existing data from one representation to the other. Then, you would either create a new BankingService with the new data, or use some method provided in the BankingService to update the data structure.
Something to think about: Providing a simple setter method would let you change the data structure. What’s wrong with this solution? What would a better solution look like?

4 Summary

Compare the original banking code to the revised version. The new BankingService is much cleaner and more maintainable. It allows the information about accounts and customters to change with less impact on the banking service methods. The banking service no longer relies on any particular data structure for accounts and customers. We achieved both of these goals by isolating data and methods in classes, and using interfaces to separate general data from implementation details.

Key take-aways from these lectures:

Encapsulation is about putting data and the methods that operate on that data together. OO classes provide a natural mechanism for doing this.
Encapsulation matters because it lets you change data and how it is used without editing existing code. This lecture has shown two examples of this:
- We might add information (such as a withdrawal fee) and want to change methods (such as withdraw) to use that information.
- Writing code that can be customized to different specific data structures (such as linked lists versus AVL trees).
Java provides access modifiers that let you control which other classes can access your methods and fields. You should put explicit access modifiers on all of your fields and methods.
Encapsulation is NOT just about protecting your data. It is about your class hierarchy (more generally, the architecture of your program). The Java access modifiers are used in conjunction with encapsulation (and indeed help reinforce encapsulation), but encapsulation is a much broader topic.

Encapsulation is an important issue no matter what language you are programming in. Different languages provide different support for encapsulation. Java’s support comes in the form of classes and interfaces (supported by access modifiers). Other languages have other mechanisms. When designing a new product in any language, it is important to ask what information you want to protect and what decisions you want to be able to change later, then understand how the language can help you achieve those goals.

4.1 Two Myths About Encapsulation

Those of you with prior Java experience may have heard two general guidelines or slogans that aren’t quite accurate:

MYTH: Always make a getter and setter for every private field. No. Sometimes, we have fields that we don’t want anyone to have access to – they are for internal use only. Creating getters/setters for such fields contradicts the design goal of those fields. Would you publish a getter for the password field of Customer, for example? No – that data should only be handled within the Customer
This rule can also violate representation encapsulation. Imagine that ICustSet provided a method getCustomers that returned the customers field. That would defeat the whole purpose of hiding the representation, because a programmer could get the actual data structure and write code against it (like a for-loop against the LinkedList). This is a great example of where the getter is exactly the wrong thing to provide.
MYTH: Encapsulation equals Information Hiding. These concepts are related, but not equal. One can encapsulate data and methods, but still not hide information (if you expose everything through getters, for example). Encapsulation is part of a solution for hiding information, but information hiding needs careful design beyond just putting all data and methods together in a class.

1	Code Critique
2	Encapsulating Knowledge
3	Encapsulating Representation
4	Summary