Randomly Draw k unique integers out of an array of N unique integers

Given an array of n unique integers (1 to n), write an algorithm to draw a random combination of k distinct numbers (n >= k). (This problem comes from Core Java Vol I as an example.)

Unknown: A way to draw k distinct integers out of  n distinct integers.

Data: An array of integers 1 to n.

Constraint: k numbers must be distinct and randomly picked.

A straightforward solution would be:

1. Randomly pick one number out of the an array
2. If this number is not picked before, add it to the result. Return the result if we have k numbers.
3. Otherwise, back to step 1.

Q: So what is the time complexity of this solution?

A: If we are unlucky, in the worst case, O(k^2) and if k close to n, O(n^2).

Q: How so?

A: At some point, we will have problem selecting a number that’s not in the result set.  The first number is easy, just once. The second, if unlucky, twice. The third, if unlucky, 3 times since the first 2 times picked something in the result set…so up to k numbers, it can take 1+2+3+…+k picks which is approximately O(k^2). If k is close to n, then we have a O(n^2) algorithm. check the link at the bottom for the code.

Q: Alright, can we make it faster? Say let’s make it O(n) time and you cannot use additional space except for the result set.

A: Hmm, the bottleneck of the previous solution is that every time we pick a number, we have to check if it exists in the result set. If it does, we have to go back and pick again. If we can skip this step it will be faster.

Q: What do you mean by skipping this step?

A: I mean that every time we pick a number, it is guaranteed not picked before.

Q: How do we do that?

A: Hmm. we need to keep track of what has not been picked instead. Since we cannot use additional space, I assume that we have to do something on the original array. I can replace the picked number with some special value like n+1, but this sounds useless since if I happened to pick this number, I would have to choose again, exactly like before. I don’t know…

Q: OK, in what situation can we safely draw an unpicked number?

A: If the array only contains unpicked numbers, we can do that safely. But again, I don’t think we can recreate a new array to exclude the picked one in every pass. That’s O(n^2) again.

Q: True. So why can’t we draw the numbers safely now? What’s the matter?

A: Because there are picked values between unpicked ones.

Q: Good. You mentioned about excluding them. Is there a way to do that without creating a new array?

A: I suppose I can re-arrange the array? For example, if I picked the number at index i, I can move the numbers from i+1 to n-1 forward. But then I should pick a random index between 0 to n-1 (exclusive). Wait, this is still O(n^2)…

Q: Do we have to move all the elements after index i? Can we reduce this O(n) move to a O(1) move?

A: O(1)? So I should move only 1 element instead. But which one…

Q: Let’s use an example: 1,2,3,4,5 Say we picked 3. In your case, we change the array to 1,2,4,5,5 and then we pick from index 0 to 3 next time. We do the move because we want to make sure next time we are choosing from 1,2,4,5. So is there another way to do it?

A: Yes! I can move the last element to that position to achieve the same effect! So every time after the pick, I move the last element within the current range to the picked position then reduce the range by 1.

Q: That’s right 🙂

Gas station

There are N gas stations along a circular route, where the amount of gas at station i is gas[i].

You have a car with an unlimited gas tank and it costs cost[i] of gas to travel from station i to its next station (i+1). You begin the journey with an empty tank at one of the gas stations.

Return the starting gas station’s index if you can travel around the circuit once, otherwise return -1.

Note:
The solution is guaranteed to be unique.

(Java Code on Github at the bottom of the post. )

Q: What is the unknown?

A: The index of the gas station from which we can circle around the route.

Q: What are the data?

A: There are N gas stations. At each station i (i = 0, 1, 2, 3, …, N-1) the amount of gas is gas[i]. The tank is unlimited. To get from station i to i+1, it needs cost[i] amount of gas. We start from an empty gas tank.

Q: Are there any constraints?

A: Yes, the solution is unique and if none satisfies the requirements, return -1.

Q: What do you think is the key point of this problem?

A: That whether we can circle around from a station with index i.

Q: And how would you like to do that?

A: Hmm, we start at index i with L amount of gas in the tank (initially L is zero). If L + gas[i] >= cost[i], that means we can move from index i to i + 1. We repeat this process until we meet the following cases:

1. At station k % N (k % N < i), L + gas[k%N] < cost[k%N].  This means that we cannot circle around the route from station i.
2. k % N == i. In this case we have circled around the route from station i and we should return i.

Based on this, we can think of a straightforward algorithm. At each station, check if we can circle around the route as stated above.

Q: OK. What is the time complexity of this algorithm?

A: If we are extremely unlucky, we may have to almost circle around the route every time we perform a station check. That takes O(N) time. Since we have to do it for all N stations, I would say this algorithm takes O(N^2) time.

Q: Can we do better than this?

A: Hmm, I don’t think we can improve the station check since we do have to go around all the stations in the circle. But maybe we don’t have to do a station check for each station in the circle. There might be a way to skip some of them based on the result of the previous station checks.

Q: That’s a good thought. Can you think of an example to prove your idea?

A: OK, suppose I’m at station i and I can reach to station j at most where i <= j. That means the left amount of gas L at station j plus gas[j] is less than cost[j]. But then what?

Q: OK, normally we would just go check station i+1 (suppose i + 1 <= j), right? Now can you deduct how far we can get starting from i+1 based on the result of station i?

A:  I’m not sure…

Q: When we are at station i + 1 with a fresh start, the amount of gas left in the tank L is zero. We also know that we can go from i to i + 1. So from i to i+1, the amount of gas left in the tank L at station i + 1 must be no less than zero (otherwise we cannot go from i to i + 1). In this case if we can at most get to station j from i, how far do you think we can get to from station i+1?

A: Eh…at most j as well. Ah I get it. If we can go from station i to station j but not circling around the route we should skip all stations from i + 1 to j and only start the next station check at station j + 1.

Q: You got it. So what about the running time of this algorithm?

A: Hmm, in this algorithm each station is checked at most once. So I would say it is an O(N) algorithm.

Q: OK. Go implement this algorithm.

Code on github:

Valid Parentheses

Valid Parentheses
Given a string containing just the characters ‘(‘, ‘)’, ‘{‘, ‘}’, ‘[‘ and ‘]’, determine if the input string is valid.
The brackets must close in the correct order, “()” and “()[]{}” are all valid but “(]” and “([)]” are not.

(Java Code on Github at the bottom of the post. )

Q: What is the unknown?
A: determine if the input string is valid.
Q: What does it mean by valid?
A: Parentheses must be in the correct order:
1. Each (, [, { must correspond to one ), ], }.
2. ), ], } cannot exist without a preceding (, [, {.
3. The parentheses must not intertwine like ([)], however {([])} is allowed.
Q: So these are also the constraints then. What about the data, input and output?
A: The input data is a string and the output is a boolean.
Q: Special cases we should pay attention to?
A: Null string, empty string, string with odd length, string with characters other than (), [] and {}. The algorithm should return false in these cases.

Q: OK. Have you seen it before? Or something similar?
A: Yes I’ve seen something similar. In that problem, using a Stack is helpful.
Q: And why is that?
A: Well, the validation actually happens when we meet one of the three characters ), ] and }. At that point, we need to check if there is a preceding (, [, or {. A stack helps us to keep track of what we have seen previously and easily get back to it.
Q: Fair enough. So how would you use the stack here? How does it operate?
A: Hmm, whenever we meet a (, [ or {, we push it into the stack. In other cases, we first check if the stack is empty. If yes, then return false. If no, then check the top character of the stack matches in pair with the current character. If yes, we pop the stack. Otherwise, return false.
Q: OK, then when do we return true?
A: When we finish scanning the string, if the stack is empty then we return true. We must check this otherwise it will fail on test case like “((“.
Q: Good. What is the complexity of this algorithm?
A: The time and space complexities are both O(n) where n is the length of the string since we only scan the string once but we need the stack to save the characters.

Code on github:

Update:
Q: can we do better?
A: well we’ve already achieved O(n) time complexity. I’m not sure if we can make it even faster. After all, we do have to check all characters.
Q: Then what about space? Is the stack really necessary?
A: Hmm, you are saying that we should use something else to keep track of previous characters?
Q: Yep, the characters are in the string, right? I can access them easily using index.
A: OK, I see where this is going. So instead of using a stack, we should consider using indices to keep track of previous left characters?
Q: Yeah.
A: Hmm, in that case, we need the index of the latest left character and its previous character and this shall apply to all seen left characters and we will need extra space to either create a wrapper class to hold the index and this will cost O(n) space too. So it’s not better.
Q: OK, fair enough. But it’s always good to think about possible improvement.
A: Of course.

Two Sum

Given an array of integers, find two numbers such that they add up to a specific target number.

The function twoSum should return indices of the two numbers such that they add up to the target, where index1 must be less than index2. Please note that your returned answers (both index1 and index2) are not zero-based.

You may assume that each input would have exactly one solution.

Input: numbers={2, 7, 11, 15}, target=9
Output: index1=1, index2=2

(Java Code on Github at the bottom of the post. )

Q: What is the unknown?
A: the indices of the two numbers in the array that add up to a target value.
Q: What are the data?
A: An array of integers and a target value.
Q: What are the constraints? Ambiguities?
A: The index is not zero-based. The example gives us a sorted array but it may not be sorted in other cases. There may be duplicates in the array.
Q: Have you seen this problem before?
A: Yes, but I want to redo this problem like a clean slate rather than memorize the solution to it.
Q: How do we approach it?
A: One possible solution would be for each element in the array, subtract it from the target value and check if the remained value is also in the array. But that would take O(n^2) time (and O(1) space).
Q: So can we do better? If we can, what is the bottleneck?
A: The bottleneck is that we have to scan through the whole array to see if a value exist. If we can speed up this lookup process, it would help to speed up the whole algorithm. Current lookup is O(n) and possible upgrades are O(logn) and O(1). Now what algorithm or data structure can achieve O(logn) and O(1) lookup? I would say BST and Hashtable. Since both data structure uses O(n) space, we will consider hashtable since it is faster in lookup.
Q: OK, so how does hashtable lookup fit in here?
A: Well, I think we need to scan through the array first and count the frequency of each value. Then in the second pass, we calculate the remained value by subtract the current value from the target and check if the remained value exists in the hashtable. If not, we move on to the next one. If yes, then we have found one (here be careful about duplicates and values like 4-2=2).
Q: Yes, we can do that, but the problem asks for the index of the two numbers, not the numbers themselves.
A: Ah yes, we can use another hashtable to keep track of the indices of each value or we can scan through the array to find the index of these two values. The first way require extra O(n) space while the second way requires an extra pass of O(n) time.

Q: OK, one question: is it possible that an number overflow happens in your algorithm? If yes, how do you solve it?
A: Suppose that our target value is Integer.MIN_VALUE + 1 and our array is [3, -1, Integer.MIN_VALUE + 2]. In this case, if we subtract 3 from our target value, we will get out of Integer range. Similarly, if our target value is Integer.MAX_VALUE – 3 and our array is [-4, 1, Integer.MAX_VALUE – 4], we will get the same type of problem when we subtract -4 from our target value. So it looks like that we need to validate the range before subtracting.
Q: Yes, you got this part right. Now do you have any idea of how to do it?
A: Well, the target value is in range. Basically we need to check if we subtract the current value from the target, would it be outside of [Integer.MIN_VALUE, Integer.MAX_VALUE]? To get lower than Integer.MIN_VALUE, we need to subtract a positive number from the target. To get higher than Integer.MAX_VALUE, we need to subtract a negative number from the target. So if the current value is positive, we check if the MIN plus current is larger than the target. If yes, then it’s going to be out of bound if we do the subtraction. Similarly, if the current value is negative, we check if the MAX plus the current value is smaller than the target. If yes, then it’s going to be out of bound if we do the subtraction. Two examples here:
Say the range is [-10, 10]. target value is 3, current value is -8. 10-8 = 2 is smaller than 3, so this is going to be out of bound.
if the current value is 9, -10 + 9 = -1 is smaller than 3, so subtracting 9 from 3 is OK. We can even drop a graph to illustrate this idea.

Q: GOOD, you got it. Make sure you implement the code. Any other way to solve this problem?
A: Well, yeah we could sort the numbers first but that messes up the index and we have to keep track of the previous index somehow. So that’s not worth it in this problem. If we are only looking at the numbers, this approach may save us some spaces.

Code on github:

Reverse Nodes in k-Group

Given a linked list, reverse the nodes of a linked list k at a time and return its modified list.

If the number of nodes is not a multiple of k then left-out nodes in the end should remain as it is.

You may not alter the values in the nodes, only nodes itself may be changed.

Only constant memory is allowed.

For example,
Given this linked list: 1->2->3->4->5

For k = 2, you should return: 2->1->4->3->5

For k = 3, you should return: 3->2->1->4->5

(Java Code on Github at the bottom of the post. )

My thoughts and solution:

Q: What is the unknown?

A: To reverse nodes in a linked list k at a time and return the modified list.

Q: What are the data?

A: A linked list and an integer k.

Q: What are the constraints?

A: From the example above, we should probably assume this is a singly linked list. We cannot change the value in the node (Actually this is a very good idea in modifying linked list). Constant memory only. If at some point, the number of nodes is not the multiple of k, these nodes should remain the same order.

Q: Have you seen it before or at least something similar?

A: Yes I think I have. A similar problem is to reverse a singly linked list in place. I solved it before and it only took constant memory. I think we can apply the solution of that problem to each K nodes for reversing purpose.

Q: Good. For the whole problem, what kind of strategy should we use? Suppose we have reversed K nodes in the list, what about the rest of list?

A:  We do the same for the rest and connect with the k nodes reversed previously.

Q: Can you describe it in more details?

A: OK, for the rest of the linked list we reverse the nodes in k-Group and connect with previously reversed k nodes. Ah I see where this is heading. We are looking at a recursion here.

Q: Yes we are. Now think about the base case of this recursion.

A: That would be the number of remaining nodes is not a multiple of k and we should just connect them with the previously reversed part.

Q: What about the recursive case?

A: Hmm, we should reverse the current k nodes at hand. reverse the next k nodes and connect with them. Then return the current reversed part so the previously reversed part can connect with it.

Q: OK not super clear but I guess you have the idea.  In the base case, how do you determine if the number of remaining nodes is a multiple of k?

A: Starting from where we are, go k steps further. If at any point, we hit null, then we know there are enough nodes for reversal.

Q: Good. The take-away message here is that recursion could be a very handy trick for solving linked list related problems. So keep it in your toolbox.

Code on github:

Nuts and bolts

Nuts and bolts. A disorganized carpenter has a mixed pile of N nuts and N bolts. The goal is to find the corresponding pairs of nuts and bolts. Each nut fits exactly one bolt and each bolt fits exactly one nut. By fitting a nut and a bolt together, the carpenter can see which one is bigger (but the carpenter cannot compare two nuts or two bolts directly). Design an algorithm for the problem that uses NlogN compares (probabilistically).

My thoughts and solution:

Well, according to the question, the nut and bolt has a one-to-one relation. So the first thing come into my mind is to sort the nuts and bolts into ascending order and then we can put them together to a pair one by one.

However, we cannot compare two nuts or bolts directly. So we have to think about indirect ways. Say we have two nuts N1 and N2. Well, since the nut and bolt has a one-to=one relation, there must exist two bolts B1 and B2 which exactly fit with N1 and N2. So if we try to fit both nuts with B1 (or B2), we can tell which nut is bigger. This is also suggested by the question that By fitting a nut and a bolt together, the carpenter can see which one is bigger”. The same works for comparing B1 and B2 indirectly.

OK, the question did mention NlogN compares probabilistically. Well that reminds me of QuickSort. So how does it apply here? QuickSort requires partitioning and recursion. OK, let’s look at partitioning.

To partition the nuts, we need to randomly select one nut, then compare it with the rest. But we cannot, at least not directly. So the alternative is to randomly select one bolt “B”, then compare each nut with it. In the end, we will have two small piles of nuts (one smaller than the bolt “S” and the other larger “L”) and the right one fits in the bolt “N”. Done!

Then we can do recursion, right? for both piles of nuts, randomly choose a bolt to partition…But wait! Whatever bolt we pick for “S” should come from the set of bolts that correspond to the nuts in “S” and so is the case for “L”. Otherwise, we cannot guarantee the correctness of the QuickSort algorithm.

So before we do the recursion, we have to find out the corresponding piles of bolts for “S” and “L”. How? Well, we do know the nut “N” that fits into “B” perfectly. We can do a partition of the bolts based on “N”. Then we will get our desired piles of bolts for “S” and “L”.

Now we can go on to the recursion.

About the comparison, it is O(NlogN). But I’m afraid we are using 2NlogN since we compared 2N in partitioning  the bolts and nuts. If you have some good ideas, please leave some comments.

Sqrt(x)

Implement int sqrt(int x).

Compute and return the square root of x.

(Java Code on Github at the bottom of the post. )

My thoughts and solutions:

OK, how hard can this be, right? just find the first integer n such that n * n <= x and (n+1) * (n + 1) > x. Well mathematically this is correct, but if we transfer this idea to a program directly, we may run into the following two problems:

• Time Limit Exceeded (well the OJ has a time limit for this question. O(N) is not acceptable)
• Integer Overflow! It is possible that n*n is greater than the limit of Java integer (from -2^31 to 2^31 – 1).

I was stuck. Then someone on the discussion board mentioned: Think about math back in elementary school. n * n <= x is equivalent to what? Stupid me, n <= x / n. So if we avoid using plus and multiplication in comparison, we can avoid the integer overflow problem using subtraction and division. And I think this idea can extend to other primitive types as well. For something like (a+b)/2 we could re-write as a/2 + b/2.

As for the time limit problem, we can consider two improvements:

• The naive algorithm would go from 1 to x. This is not necessary. We can simply check from 0 to x/2. Why? Because when x > 4, (x/2)^2 is always larger then x.
• Is it necessary that we must go for a linear search? It’s a sorted sequence from 1 to x/2 and we are trying to find a number with a certain property. And this reminds me of binary search.

With these three improvements,  we can solve this question efficiently.

Code on github:

MergeSort with practical improvement

Well MergeSort is a non-trivial sorting algorithm with time complexity O(NlogN) and space complexity O(N). It has very simple concept with both top-down and bottom-up approaches.

There are some practical improvements according to the book Algorithms, 4th Edition from Princeton University.

• Overhead for sorting small arrays: for array with size less than a CUTOFF value (it is set to 7), use insertion sort instead.
• Before merging: if the largest value of the left sorted half is less than the smallest value of the right sorted half, there is no need for merging.
• can’t remember…

The improved code is on github:

The bottom-up version:

Binary Tree Postorder Traversal Non-recursive Version

Given a binary tree, return the postorder traversal of its nodes’ values.

For example:
Given binary tree {1,#,2,3},

   1
\
2
/
3

return [3,2,1].

Note: Recursive solution is trivial, could you do it iteratively?

(Java Code on Github at the bottom of the post. )

My thoughts and solutions:

Well the recursive solution is indeed trivial but quite elegant. However, sometimes we may hit a stack overflow exception and that’s why we need the iterative version. To convert a recursive solution into an iterative one, we have to implement our own stack.

So what should we put in our own stack? Think about the recursive version. For example, for each node that does not satisfy the condition of the base case, the function immediately calls on itself with the current node’s left child (if it exists), leaving its right child (if it exists) and its current value unprocessed. So the items going to the stack are nodes that are not fully processed: either its left or right child has not been visited yet.

So the stack is for tree nodes, but we also need to know whether the children of each node have been visited. The original node class does not support this. One simple solution is to wrap the TreeNode class in another class and I call it NodeInfo (it is a private class inside the PostorderTraversal class) and we create a stack of NodeInfo:

/**
* Definition for binary tree
* public class TreeNode {
*     int val;
*     TreeNode left;
*     TreeNode right;
*     TreeNode(int x) { val = x; }
* }
*/
private class NodeInfo{
TreeNode node;
boolean isLeftVisited;
boolean isRightVisited;

public NodeInfo(TreeNode node){
this.node = node;
isLeftVisited = false;
isRightVisited = false;
if(node.left == null) isLeftVisited = true;
if(node.right == null) isRightVisited = true;
}
}


OK, how does this work? Remember that for post order traversal, it has the following order:

• Visit left subtree
• Visit right subtree
• Visit current node

So we can push our root wrapped in NodeInfo object to the stack. Then, as long as the stack is not empty:

• we peek at the top node.
• if the node has left child and it is not visited, set isLeftVisited to true and visitSubtree(left)
• else if the node has right child and it is not visited, set isRightVisited to true and visitSubtree(right)
• else if both children are visited, pop the top node and add its value to an ArrayList solution.

How do we visit the subtree?

while the node is not a leaf:

• push itself to the stack
• if it has unvisited  left child, set isLeftVisited to true and point itself to its left child
• else if it has unvisited  right child, set isRightVisited to true and point itself to its right child

When the loop ends, the node should point to a leaf node so just add its value to the ArrayList solution.

I had trouble at first worrying about visiting only one side of a node in the loop, like we are missing them. But then I realized that all nodes with unvisited children are push to the stack along the way to the leaves so no worries.

Code on github:

Set Matrix Zeros

Given a m x n matrix, if an element is 0, set its entire row and column to 0. Do it in place.

Follow up:Did you use extra space?
A straight forward solution using O(mn) space is probably a bad idea.
A simple improvement uses O(m + n) space, but still not the best solution.
Could you devise a constant space solution?

(Java Code on Github at the bottom of the post. )

My thoughts and solutions:

Well, this is a rather easy question, without the last follow ups. I came up with O(m+n) space solution at the beginning and then struggled with the constant space solution.

The O(m+n) solution:

Get a boolean array of size m (called “rows”) and another of size n (called “cols”), then scan through the matrix. Say matrix[i][j] == 0, then we set rows[i] = true and cols[j] = true, which indicates that row i and column j should be set to zeros.

For the constant space solution, I had two thoughts. They failed, but I think it’s still worth writing down here:

• Maybe we can do this change in place, with recursion (getting too complicated).
• Bit operations. We use two integers rowCheck and colCheck, one for the rows and the other for the columns, initialized to 0. Think about their bit representation. rowCheck should have m bits while colCheck should have n bits. Say row 3 should be set to zeros, then the third bit in rowCheck should be set to 1. The program works fine for almost all test cases except for the last three. I haven’t figure out the reason yet. But it’s just good to know that bit operation is there for us to consider if constant space is required.

So, I peek at the discussion forum and found a great solution:

• For matrix cells that are neither on the first row nor on the first column, an zero in the cell matrix[i][j] means the same as having matrix[i][0] = 0 and matrix[0][j] = 0. So we can simply store the locations of zeros in the first row and first column.
• But what about the cells on the first row and column? If we do the first step above, we are overriding the values and we won’t know if there are any zeros in the first row and column originally. Therefore, we have to check if there are zeros in them first.
• Once the first two steps are done, we can use the info on the first row and column to set zeros in matrix[1:m][1:n]. Do not set cells on first row and column!
• Final step, if there are zeros originally on the first row and column, set zeros in corresponding cells.

Code on github: