# Merge sort for the AP CS A Exam

## `sort` method derivation from method headers

I present merge sort to students much differently than I present selection sort and insertion sort. Instead of tracing the algorithm first, I ask students to derive the merge sort algorithm from the method headers.

The method headers for a recursive implementation of merge sort are below. Preconditions and postconditions are included.

``````// postcondition: arr ... arr[arr.length - 1] is sorted
public static void sort(int[] arr)
{

}

// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
// This method is recursive.
}

// precondition: arr[low] ... arr[mid] is sorted &&
//               arr[mid + 1] ... arr[high] is sorted
//
// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void merge(int[] arr, int low, int mid, int high)
{
// Assume, for now, that this method already works.
}
``````

### Step 2 recursive `sort` method base case

`sort(int[], int, int)` is recursive. When writing recursive methods, we start with a base case (or possibly base cases). The method is intended to sort a range of values in `arr`. If there is only 1 value (`low == high`), we return (stop) since a single value is already sorted. If there are 0 values (`low > high`), we return for the same reason.

``````// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
// base case
if(low >= high)
return;
}
``````

A `return` statement in a `void` method ends the method. It is also possible to write the opposite of the condition (`low < high`) and write the rest of the method as the body of the conditional statement.

### Step 3 rest of recursive `sort` method

When writing recursive methods, we assume that the method we are currently writing already works, as long as we call it with something that approaches the base case. We assume that `sort(int[], int, int)` works as long as we call it with fewer elements.

We assume that `merge` works because we were told to.

The postcondition of `merge` is identical to the postcondition of `sort(int[], int, int)`. If we could satisfy the precondition of `merge`, we could call it and accomplish the task of sorting `arr[low] ... arr[high]`.

The preconditon of merge is `arr[low] ... arr[mid]` is sorted and `arr[mid + 1] ... arr[high]` is sorted. We have a method that sorts parts of an array (the method we are currently writing).

``````// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
// base case
if(low >= high)
return;

int mid = (low + high) / 2;

sort(arr, low, mid);
sort(arr, mid + 1, high);
merge(arr, low, mid, high);
}
``````

The recursive calls to `sort` approach the base case because each call has fewer elements than the original range. (The base case is 0 or 1 elements.) The call to `merge` works because its precondition is met. The postcondition for `merge` is the same as the postcondition for `sort(int[], int, int)`, so we’re done.

### The public `sort` method

`sort(int[], int, int)` is a private recursive helper method. Recursive helper methods are used when different parameters are desired than those taken by the public method. The public method `sort(int[])` calls the private method `sort(int[], int, int)`.

``````// postcondition: arr ... arr[arr.length - 1] is sorted
public static void sort(int[] arr)
{
sort(arr, 0, arr.length - 1);
}
``````

## The `merge` method

For the derivation above, we assume the `merge` method works as specified by its precondition. This assumption allows us to focus on the interesting `sort` method.

Below, we trace the `merge` method for a single step.

### `merge` method simple trace

#### Step 1

``````arr -> [36 71 78 79 86 11 24 35 75 77]
0  1  2  3  4  5  6  7  8  9

low: 0
mid: 4
high: 9
``````

The precondition has been met. `arr ... arr` is sorted and `arr ... arr` is sorted.

For this trace, `low` and `high` are the lowest and highest indexes of `arr`, respectively. The approach is exactly the same when working with a smaller range of elements. We would just ignore all elements outside `arr[low] ... arr[high]`.

#### Step 2

A simple implementation of `merge` copies both sorted halves into temporary arrays. We maintain 3 indexes.

``````arr ->        [36 71 78 79 86 11 24 35 75 77]
^
arrIndex

firstHalf ->  [36 71 78 79 86]
^
firstIndex

secondHalf -> [11 24 35 75 77]
^
secondIndex
``````

#### Step 3

We find the smaller of `firstHalf[firstIndex]` and `secondHalf[secondIndex]`. We copy the smaller element to `arr[arrIndex]`, overwriting the existing value. (We favor `firstHalf[firstIndex]` if `firstHalf[firstIndex] == secondHalf[secondIndex]`.)

We increment `arrIndex` and the index of the array from which a value was selected. In this case, we increment `arrIndex` and `secondIndex`.

``````arr ->        [11 71 78 79 86 11 24 35 75 77]
^
arrIndex

firstHalf ->  [36 71 78 79 86]
^
firstIndex

secondHalf -> [11 24 35 75 77]
^
secondIndex
``````

#### Step 4

We repeat the process of selecting the smaller of `firstHalf[firstIndex]` and `secondHalf[secondIndex]` until we run out of elements in `firstHalf` or `secondHalf`.

``````arr ->        [11 24 35 36 71 75 77 35 75 77]
^
arrIndex

firstHalf ->  [36 71 78 79 86]
^
firstIndex

secondHalf -> [11 24 35 75 77]
^
secondIndex
``````

#### Step 5

We copy the remaining elements from the half that still has elements into `arr`. In this case, we copy the remaining elements from `firstHalf`. There are no further comparisons. The remaining elements are simply copied into `arr` starting at `arrIndex`.

``````arr ->        [11 24 35 36 71 75 77 78 79 86]
^
arrIndex

firstHalf ->  [36 71 78 79 86]
^
firstIndex

secondHalf -> [11 24 35 75 77]
^
secondIndex
``````

### `merge` method simple implementation

The implementation below uses the easily understood approach of copying both halves into temporary arrays. This is the approach shown in the trace above.

In the trace above, `low` and `high` were the lowest and highest indexes of `arr`, respectively. The implementation below does not assume this. It works with all valid values of `low` and `high`.

``````// precondition: arr[low] ... arr[mid] is sorted &&
//               arr[mid + 1] ... arr[high] is sorted
//
// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void merge(int[] arr, int low, int mid, int high)
{
int[] firstHalf = new int[mid - low + 1];
for(int i = 0; i < firstHalf.length; i++)
firstHalf[i] = arr[low + i];

int[] secondHalf = new int[high - mid];
for(int i = 0; i < secondHalf.length; i++)
secondHalf[i] = arr[mid + 1 + i];

int arrIndex = low;
int firstIndex = 0;
int secondIndex = 0;

while(firstIndex < firstHalf.length && secondIndex < secondHalf.length)
{
if(firstHalf[firstIndex] <= secondHalf[secondIndex])
{
arr[arrIndex] = firstHalf[firstIndex];
firstIndex++;
}
else
{
arr[arrIndex] = secondHalf[secondIndex];
secondIndex++;
}

arrIndex++;
}

while(firstIndex < firstHalf.length)
{
arr[arrIndex] = firstHalf[firstIndex];
firstIndex++;
arrIndex++;
}

while(secondIndex < secondHalf.length)
{
arr[arrIndex] = secondHalf[secondIndex];
secondIndex++;
arrIndex++;
}
}
``````

It is also possible to write `merge` with a single `while` loop. The conditional statements would be modified to check if the indexes are valid.

## Merge sort recursive implementation

In the implementation below, both `sort` methods are identical to those presented earlier. The `merge` method has been improved to copy only half of `arr[low] ... arr[high]` into a temporary array.

When I have time, I’ll post a trace of the improved `merge` method. Briefly, it copies only the first half into a temporary array, because we won’t run into where we are copying from in the second half anyway. This also offers the advantage that if we run out of elements in the first half, the ones in the second half are already where we want them.

``````// postcondition: arr ... arr[arr.length - 1] is sorted
public static void sort(int[] arr)
{
sort(arr, 0, arr.length - 1);
}

// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void sort(int[] arr, int low, int high)
{
if(low >= high)
return;

int mid = (low + high) / 2;

sort(arr, low, mid);
sort(arr, mid + 1, high);
merge(arr, low, mid, high);
}

// precondition: arr[low] ... arr[mid] is sorted &&
//               arr[mid + 1] ... arr[high] is sorted
//
// postcondition: arr[low] ... arr[high] is sorted &&
//                elements outside of this range are unchanged
private static void merge(int[] arr, int low, int mid, int high)
{
int[] b = new int[mid + 1 - low];
for(int i = 0; i < b.length; i++)
b[i] = arr[low + i];

int aLowerIndex = low;
int bIndex = 0;
int aHigherIndex = mid + 1;

while (aLowerIndex < aHigherIndex && aHigherIndex <= high)
{
if (arr[aHigherIndex] < b[bIndex])
{
arr[aLowerIndex] = arr[aHigherIndex];
aHigherIndex++;
}
else
{
arr[aLowerIndex] = b[bIndex];
bIndex++;
}

aLowerIndex++;
}

while (aLowerIndex < aHigherIndex)
{
arr[aLowerIndex] = b[bIndex];
aLowerIndex++;
bIndex++;
}
}
``````