Heap Sort

Wikipedia has very good explanation about heap sort and why its time complexity is O(nlogn):

“The heapsort algorithm involves preparing the list by first turning it into a max heap. The algorithm then repeatedly swaps the first value of the list with the last value, decreasing the range of values considered in the heap operation by one, and sifting the new first value into its position in the heap. This repeats until the range of considered values is one value in length.

The steps are:

  • Call the buildMaxHeap() function on the list. Also referred to as heapify(), this builds a heap from a list in O(n) operations.
  • Swap the first element of the list with the final element. Decrease the considered range of the list by one.
  • Call the siftDown() function on the list to sift the new first element to its appropriate index in the heap.
  • Go to step (2) unless the considered range of the list is one element.

The buildMaxHeap() operation is run once, and is O(n) in performance. The siftDown() function is O(log(n)), and is called n times. Therefore, the performance of this algorithm is O(n + n * log(n)) which evaluates to O(n log n).”


Some key points to understand this algorithm and its implementation:

  • the heap data structure is actually a virtual thought, it exists only in our imagination, in reality, we’re only re-positioning the elements into different indices, this is also why heap sort space complexity is O(1), since it doesn’t require any extra memory
  • how do we construct this virtual heap data structure? some keys to understand it:
    • regard the array as an level order traversal of a complete binary tree (draw out the picture, and then you’ll have a better understanding)
    • how to construct a max heap? we always want the parent’s val to be greater than its both children’s val, so we swap the two if we find a situation that doesn’t follow this rule and recursively do it.
    • how do we find a node’s left and right children?
      • if you draw out the tree, you’ll figure out that: a node with index i, its left child will be 2*i and its right child will be 2*i+1, if the two children exist.


  • Heap sort in NOT stable: the relative order of equal elements might be changed since heapsort peeks the largest element and put it at the last of list.

Used the wikipedia example input: 6, 5, 3, 1, 8, 7, 2, 4 to test my code:heap_sort

public class _20160710_HeapSortAgain {
private static int N;
public static void sort(int[] nums){
for(int i = N; i > 0; i--){//i doesn't need to be equal to zero, because we don't need to swap zero-indexed number with itself
swap(nums, i, 0);//we always swap the first element in the array which means it's at the root of the heap with the number at index i which is the largest index in the UN-sorted array
N -= 1;//don't remember to decrement N by 1, because we only need to worry about one number fewer each time
maxheap(nums, 0);//then we always update the heap for the number at index zero
private static void heapify(int[] nums) {
N = nums.length-1;
for(int i = N/2; i >= 0; i--){//here we need i to be equal to zero because we need to do maxheap() on its first element as well
maxheap(nums, i);
private static void maxheap(int[] nums, int i) {
int leftChildIndex = 2*i;
int rightChildIndex = leftChildIndex+1;
int max = i;
if(leftChildIndex <= N && nums[leftChildIndex] > nums[i]){
max = leftChildIndex;
if(rightChildIndex <= N && nums[rightChildIndex] > nums[max]){
max = rightChildIndex;
if(i != max){
swap(nums, i, max);
maxheap(nums, max);
private static void swap(int[] nums, int i, int j) {
int temp = nums[i];
nums[i] = nums[j];
nums[j] = temp;
public static void main(String...strings){
int[] nums = new int[]{6,5,3,1,8,7,2,4};
// int[] nums = new int[]{1,2,3,4,5,6};
// int[] nums = new int[]{6,5,4,3,2,1};
print("BEFORE printing, nums are: ", nums);
print("AFTER printing, nums are: ", nums);
private static void print(String msg, int[] nums) {
for(int i : nums){
System.out.print(i + ", ");



  • The idea behind a Map is to be able to find an object faster than a linear search.
    • Using hashed keys to locate objects is a two-step process. Internally the Map stores objects as an array of arrays.
    • The index for the first array is the hash code of the key. This locates the second array which is searched linearly by using equals() to determine if the object is found.
  • Open addressing, or closed hashing, is a method of collision resolution in hash tables. With this method a hash collision is resolved by probing, or searching through alternate locations in the array (the probe sequence) until either the target record is found, or an unused array slot is found, which indicates that there is no such key in the table.
  • Well known probe sequences include:
  • Linear probinghttps://courses.cs.washington.edu/courses/cse326/00wi/handouts/lecture16/sld015.htm
    • in which the interval between probes is fixed — often at 1.
  • Quadratic probing 
    • in which the interval between probes increases linearly (hence, the indices are described by a quadratic function).
  • Double hashinghttps://courses.cs.washington.edu/courses/cse326/00wi/handouts/lecture16/sld025.htm
    • in which the interval between probes is fixed for each record but is computed by another hash function.
  • The main tradeoffs between these methods are that linear probing has the best cache performance but is most sensitive to clustering, while double hashing has poor cache performance but exhibits virtually no clustering; quadratic probing falls in-between in both areas. Double hashing can also require more computation than other forms of probing.
  • A critical influence on performance of an open addressing hash table is the load factor; that is, the proportion of the slots in the array that are used. As the load factor increases towards 100%, the number of probes that may be required to find or insert a given key rises dramatically. Once the table becomes full, probing algorithms may even fail to terminate. Even with good hash functions, load factors are normally limited to 80%. A poor hash function can exhibit poor performance even at very low load factors by generating significant clustering. What causes hash functions to cluster is not well understood, and it is easy to unintentionally write a hash function which causes severe clustering.
  • Use LinkedList to store the values, but note: HashMap stores both Key and Value into the LinkedList node in the form of Map.Entry object. Otherwise, with a given key, you won’t know which value in this linkedlist should be returned
    • after finding bucket location, we will call keys.equals() method to identify a correct node in LinkedList and return associated value object for that key in Java HashMap.
  • Why HashTable doesn’t allow null key/values while HashMap does?
    • HashMap was introduced later than HashTable to fix some of its limitations;
    • HashTable uses the default .hashCode on each key to get the hashing, so, if the key is null, the method would fail
    • But in some cases, we also want to store null as the key/values, like it is useful to store null to distinguish a key that you know exists but doesn’t have an associated value and a key that doesn’t exist.
  • Why String, Integer and other wrapper classes are considered good candidates for keys in HashMap?
    • because they’re immutable.
    • HashMap uses the keys to to hashing, if the key is changed, it’s impossible to get an object from HashMap
  • The contract between equals() and hashCode() is:
    • 1) If two objects are equal, then they must have the same hash code.
    • 2) If two objects have the same hash code, they may or may not be equal.

Merge Sort

How to analyze the time complexity for Merge Sort?

  • This video helps a lot: https://www.youtube.com/watch?v=0nlPxaC2lTw
  • Prereq: how to understand O(n), O(logn) etc.
    • say the input size is n, then for a loop that starts from 0 and iterates until n or n-1, we say the time complexity for such a loop is O(n)
      • this is also why, we say the time complexity for binary search is O(logn)
        • i.e. in binary search, we only need to search logn times before we find the target because each time, we cut the size of the input by half, e.g. with an input of size 16, we only need to cut 4 times or less before we could find the target, thus, it’s log16 = 4
      • this is also why, we say the time complexity for insertion sort is O(n2)
        • i.e. insertion sort has two for loops, each for loop has to iterate n times, thus, in total, it will have to loop through n*n, i.e. O(n2) time complexity
  • Then based on the code below, merge sort time complexity could be analyzed as below:
    • we use T(n) to denote the total running time for merge sort to work on an array of size n
    • then sort(arr, l, m) takes T(n/2) time since it’s the half size of the original size
    • similarly, sort(arr, m+1, r) also takes T(n/2) time
    • but merge(arr, l, m, r) will take c*n time (c as a constant) since we’ll have to go through the entire array
    • then T(n) could be represented as T(n) = 2*T(n/2) + c*n (function 1)
    • apparently, this is a recursive function, from it, we could get the following:
      • T(n/2) = 2*T(n/4) + c*n
      • T(n/4) = 2*T(n/8) + c*n
      • T(n/8) = 2*T(n/16) + c*n
      • …..
      • take those into the function 1 and simplify it, we could get: T(n) = 2kT(n/2k)+ k*c*n (function 2)
      • we want to deduce this to T(1) because that is what we know: constant time.
      • in that case, n/2k will be equal to 1 , then we could get n = 2k, then we use log on both sides of the equation, we could get logn = log2k = k (function 3)
      • we take function 3 into function 2, we could get T(n) = 2lognT(1)+ logn*c*n, then using Big O notation (lower-order and constant operations could be neglected), this could be represented as T(n) = O(logn*n) = O(n*logn), i.e. the so-called O(nlogn) time complexity for MergeSort.

Merge Sort is a stable sorting algorithm: the order of equal keys in the original input is respected.

Merge Sort is NOT an in-place algorithm: the amount of extra space needed for merge sort is proportional to the size of the input array, this is different from other sorting algorithms like insertion sort or bubble sort which requires only a temp variable to do swap. However, in merge sort, we divide the arrays into two sub-arrays and we make copy of the entire two sub-arrays. Thus, the extra memory required for doing merge sort is proportional to the size of the input array. We say the space complexity of merge sort is O(n) which means its extra memory required is proportional to the size of the input array.

Merge sort works like this:

It partitions the original array into the most atomic elements, then compare and combine them, thus the code looks like the following:

public static void main(String... args){

int[] a = new int[]{4,3,9,8,7};

MergeSort test = new MergeSort();
test.sort(a, 0, a.length-1);


public void sort(int[] arr, int l, int r){
if(l &amp;amp;amp;amp;lt; r){
int m = (l+r)/2;
sort(arr, l, m);
sort(arr, m+1, r);
merge(arr, l, m, r);

private void merge(int[] arr, int l, int m, int r) {
//find sizes of two subarrays that are to be merged
int size1 = m-l+1;
int size2 = r-m;

//copy the two subarrays into two temp arrays
int[] tempL = new int[size1];
int[] tempR = new int[size2];
for(int i = 0; i &amp;amp;amp;amp;lt; size1; i++){
tempL[i] = arr[l+i];
for(int i = 0; i &amp;amp;amp;amp;lt; size2; i++){
tempR[i] = arr[m+i+1];

//now we merge the two subarrays

//initial indices of the two subarrays
int i = 0, j = 0;

//initial index of the merged subarray array
int k = l;

while(i &amp;amp;amp;amp;lt; size1 &amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp; j &amp;amp;amp;amp;lt; size2){
if(tempL[i] &amp;amp;amp;amp;lt;= tempR[j]){
arr[k] = tempL[i];
} else {
arr[k] = tempR[j];

//copy remaining list into arr if any
while(i &amp;amp;amp;amp;lt; size1){
arr[k++] = tempL[i++];
while(j &amp;amp;amp;amp;lt; size2){
arr[k++] = tempR[j++];

A sample walkthrough of the code (omitted merge() function), recursion is so cool/powerful!


Insertion Sort

The idea of insertion sort is to keep inserting the remaining elements one by one into the previously sorted part until no remaining elements.

Based on the description on wikipedia, I implemented insertion sort completely by myself and ran several test cases:

public int[] insertionSort(int[] nums){
for(int i = 1; i < nums.length; i++){
int j = i-1;
if(nums[i] > nums[j]) continue;
while(j >= 0 && nums[j] > nums[i]){
swap(nums, j, i);
return nums;

private void swap(int[] nums, int j, int i) {
int temp = nums[i];
nums[i] = nums[j];
nums[j] = temp;


I came up with a new way of writing insertion sort today (6/27/2016), completely by myself, although others might have written this way many times already, still feeling excited about it:

public int[] insertionSortAgain(int[] nums){
for(int i = 1; i < nums.length; i++){
for(int j = i-1; j >= 0; j--){
if(nums[j+1] <= nums[j]){
int temp = nums[j+1];
nums[j+1] = nums[j];
nums[j] = temp;
return nums;

The key is to use and j+1 in the inner for loop, instead of using i, otherwise, things won’t work!

Http vs. HttpS

Marked this one as one of my favorite questions: http://stackoverflow.com/questions/8375134/difference-between-http-and-https


What are benefits of using HTTPS over HTTP?
HTTPS means that you tunnel the HTTP protocol over TLS/SSL which encrypts the HTTP payload. So the benefit is that HTTP requests and responses are transmitted securely over the wire, e.g. your Internet Service Provider does not know what you’re doing.

How to use HTTPS?
Enable it at your endpoint, in general a web server in front of your application server. Most web servers (e.g. IIS, Apache) support this by configuration. Depending on your confidentiality requirements this may not be enough.
Can we use HTTPS for only login purpose and then onwords HTTP?
Technically this is possible, but it introduces some security risks. Example: After a secured login you transmit session IDs identifying the user. If you transmit those session IDs unsecurely (no SSL), session hijacking becomes a risk (‘man-in-the-middle’)

Is processing time required for HTTPS is greater than HTTP?
Yes, key negotiation (handshaking) requires a lot CPU capacity.