Root-causing the random failures in the integration tests with ElasticSearch

In our recent development we were creating an integration test framework and some tests for manipulating data in the ElasticSearch cluster. Strangely the tests could succeed or fail randomly, even though we never made any changes to the code on the business logic at that time.

Continue reading

How to use JavaConfig Bean in Spring XML

Our current project is at the first stage to wire all the components together and do a simple integration test. When I took on this task, I found that all beans were defined in XML. Given the number of beans I have to create, it would be tedious to write them all in XML. Personally I prefer using JavaConfig to the XML files as the navigation is easier for me in JavaConfig. But I don’t want to change the XML configurations into JavaConfig all at once. Can I define JavaConfig Beans and use them in the XML?

A bit of search revealed a simple way. Now assume that we have a provider class as follows:

package com.example.xyz;

@Configuration
public class ResourceProvider{
    @Bean
    public SQSWrapper sqsWrapper() {
        return new SQSWrapper();
    }
}

Assume that we have an application.xml file and we want to use the SQSWrapper Bean in a bean definition in the file:

<bean id="SQSConsumer" class="com.example.xyz.SQSConsumerImpl">
    <constructor-arg ref="THE_ID_OF_THE_SQSWRAPPER_BEAN">
</bean>

To do that we need to add two extra lines to the file and then we specify the id of the SQSWrapper bean by using the method name sqsWrapper. The complete xml file looks like this:

<context:annotation-config/>

<!-- The following line brings in the beans defined in the ResourceProvider -->
<bean class="com.example.xyz.ResourceProvider" />

<bean id="SQSConsumer" class="com.example.xyz.SQSConsumerImpl">
    <constructor-arg ref="sqsWrapper">
</bean>

The first line “annotation-config” is crucial as noted in this stackoverflow answer: “while annotation-config is switched on, the container will recognize the @Configuration annotation and process the @Bean methods declared in JavaConfig properly”.

Now that saved me from creating more xml files!

Randomly Draw k unique integers out of an array of N unique integers

Given an array of n unique integers (1 to n), write an algorithm to draw a random combination of k distinct numbers (n >= k). (This problem comes from Core Java Vol I as an example.)

Unknown: A way to draw k distinct integers out of  n distinct integers.

Data: An array of integers 1 to n.

Constraint: k numbers must be distinct and randomly picked.

A straightforward solution would be:

  1. Randomly pick one number out of the an array
  2. If this number is not picked before, add it to the result. Return the result if we have k numbers.
  3. Otherwise, back to step 1.

Q: So what is the time complexity of this solution?

A: If we are unlucky, in the worst case, O(k^2) and if k close to n, O(n^2).

Q: How so?

A: At some point, we will have problem selecting a number that’s not in the result set.  The first number is easy, just once. The second, if unlucky, twice. The third, if unlucky, 3 times since the first 2 times picked something in the result set…so up to k numbers, it can take 1+2+3+…+k picks which is approximately O(k^2). If k is close to n, then we have a O(n^2) algorithm. check the link at the bottom for the code.

Q: Alright, can we make it faster? Say let’s make it O(n) time and you cannot use additional space except for the result set.

A: Hmm, the bottleneck of the previous solution is that every time we pick a number, we have to check if it exists in the result set. If it does, we have to go back and pick again. If we can skip this step it will be faster.

Q: What do you mean by skipping this step?

A: I mean that every time we pick a number, it is guaranteed not picked before.

Q: How do we do that?

A: Hmm. we need to keep track of what has not been picked instead. Since we cannot use additional space, I assume that we have to do something on the original array. I can replace the picked number with some special value like n+1, but this sounds useless since if I happened to pick this number, I would have to choose again, exactly like before. I don’t know…

Q: OK, in what situation can we safely draw an unpicked number?

A: If the array only contains unpicked numbers, we can do that safely. But again, I don’t think we can recreate a new array to exclude the picked one in every pass. That’s O(n^2) again.

Q: True. So why can’t we draw the numbers safely now? What’s the matter?

A: Because there are picked values between unpicked ones.

Q: Good. You mentioned about excluding them. Is there a way to do that without creating a new array?

A: I suppose I can re-arrange the array? For example, if I picked the number at index i, I can move the numbers from i+1 to n-1 forward. But then I should pick a random index between 0 to n-1 (exclusive). Wait, this is still O(n^2)…

Q: Do we have to move all the elements after index i? Can we reduce this O(n) move to a O(1) move?

A: O(1)? So I should move only 1 element instead. But which one…

Q: Let’s use an example: 1,2,3,4,5 Say we picked 3. In your case, we change the array to 1,2,4,5,5 and then we pick from index 0 to 3 next time. We do the move because we want to make sure next time we are choosing from 1,2,4,5. So is there another way to do it?

A: Yes! I can move the last element to that position to achieve the same effect! So every time after the pick, I move the last element within the current range to the picked position then reduce the range by 1.

Q: That’s right 🙂

Link to the code: https://gist.github.com/jingz8804/53955bbaf817a6c2e179

 

Some thoughts on JUnit Test with Mockito Part II

So in the previous post we talked about some resources and concepts we should understand before writing JUnit tests with Mockito.

Well, based on my recent experience, they may not be enough, especially when our code has many dependencies or collaborators and we have to mock all of them to do the test. Things can become complicated very soon. If you find yourself in this situation, stop writing the test and refactor your code first, for example, by using the strategy pattern. In this way, you will likely have less collaborators to mock, but of course, you need to write tests for code delegated to other classes. There may be some extra work, but cleaner and easier.

Aside from this, my recent struggle also made me think about how to write unit test in general. I found another post about unit testing by Martin Fowler and read the first part of a book called “Effective Unit Testing”. There are some paragraphs that I find very helpful:

1) “We write tests assuming everything other than that unit under test is working correctly.” — Martin Fowler

Here “other than that unit under testing” usually means collaborators and we want to isolate them from the code under testing. By isolating them, we mean using test doubles to replace the real collaborators.

2) “A test should test just one thing and test it well while communicating its intent clearly.” We must ask ourselves:

What is the desired behavior for the code under testing? Think about how each step of your method should behave with certain inputs. The desired behavior of the dependencies is something we should configure the mocks to expect.

A general process could be:

Q: What do we want to test?

A: We want to test if method A is working properly.

Q: What do you mean by working properly?

A: By properly I mean that with a certain type of input INPUT, A should first call a dependency with INPUT and return a RESPONSE. Then A does some calculation by adding a number to some value stored in RESPONSE and return the sum. I expect that the sum is XXX.

Q: OK, so here we have a dependency. How do we deal with it?

A: We can isolate it by using a mock object to stub the RESPONSE when called with INPUT.

Q: Are there any requirements on the RESPONSE object?

A: Yes, since we are using some value in the RESPONSE, we should stub that value in the RESPONSE object. Otherwise, we might hit a null pointer exception situation.

Anyway, the key point is that we shouldn’t have too many dependencies for mocking in one unit test. Refactor the code first. Then work slowly through this process. Take your time.

Some thoughts on JUnit Test with Mockito

In the past a few days I’ve worked on writing unit tests for a service. It was a completely different experience. The unit tests I wrote before were very simple ones with several assertions and that’s it. However, since the service I’m writing involves calling other services, we have to mock their behaviors before testing our own part. Otherwise, if the dependent services are not working, or even not implemented, we won’t be able to do any test on our business logic at all.

It’s an easy idea to grasp, but it took me a while to understand the code. Before jumping into Mockito, there’s an article one should read: Mocks Aren’t Stubs by Martin Fowler. It explains classical and mockist testing in some details, a bit long but worth reading.

After reading that we should get an idea about mockist testing:

Set up -> Set Expectations -> Exercise -> Verification.

The second step is the part where we can tell the mocked dependent services how to behavior under certain conditions.

In the verification, we can still do the usual assertion but we also may need to verify that a dependent service is called as expected. Why do we need that? Say that you expect a method to return -1 with argument A. In an ideal successful case, we can just use assertEqual for testing. However, if the malfunction of a dependent service also causes the return value to be -1, then using assert statement alone will be insufficient. We need to make sure that a service is called and/or has the right return value.

After I got this idea, I moved on to Mockito framework, which is different from the ones mentioned in the article. But the idea is similar. There are two (actually three) articles I found online that helps you ease into using Mockito:

Occasionally we may have the need to mock static method. I had a really hard time with it when I was copying and pasting others’ test code. DON’T DO THAT! We are using PowerMock and I don’t know why I didn’t search for it early. Just RTFM, it’s actually very easy to use.

For my own reference:

To solve a problem, we need to locate the root cause first. We should set up some assumptions and verify them one by one.

Notes on Java Daemon Thread

I’m going to work on Daemon thread in my new job, but I have no idea what it is. This post summarizes some of the key points from a stackoverflow post.


 

First, let’s look at daemon threads in Unix. Simply put, they are threads running in the background that answer requests for services. You can check more of it on Wikipedia.

There are two types of Java thread:

  • Normal/User thread: Generally all threads created by programmer are user thread (unless you specify it to be daemon or the parent thread spawning the new thread is a daemon thread). The main thread is by default a non daemon thread.
  • Daemon thread: it is similar (I don’t know if I can say that. Correct me if I’m wrong please). Daemon threads are like a service providers for other threads or objects running in the same process as the daemon thread (In other words, they may serve the user threads). They are typically used for background supporting tasks.

Points to Note about Java Daemon Threads:

  • (needs verification) It has very low priority and only executes when no other threads of the same program is running
  • When there are no more user threads (meaning that only daemon threads are running in a program), the JVM will ends the program and exit. This is reasonable. If there are no one to serve any more, why keep the servants? (This is my own thoughts) 
  • When the JVM halts, all daemon threads are abandoned. The “finally blocks“ are not executed and stacks are not unwound (not sure what this means).
  • Daemon threads usually have an infinite loop in its run() method that waits for the service request or performs the tasks of the thread.
  • We can set a thread to be daemon through the setDaemon() method but we can only do that before the start of the thread.
  • We can check if a thread is a user thread or daemon thread using isDaemon() thread.

Examples of Java Daemon Threads:

  • Garbage collection. It runs in the background, claiming resources from unwanted objects.
  • A good Java code example from that post, reposted on gist

Things to check…

  • Non-daemon threads (default) can even live longer than the main thread.

Java Multithreading Notes From Lecture Two

This is part of the notes from an online course (Java Multithreading) I’m taking on Udemy. Nothing complicated.


In theory it is possible that on some system a Java thread may ignore changes to its own data from other threads. If the changes are not made inside its own thread, it may have no effect. We can call it caching variable in thread.

To prevent such thing, we can add the keyword volatile to the variable that may be changed by other threads and guarantee that changes can be seen.

An example on gist.

 

Java Multithreading Notes From Lecture One

This is part of the notes from an online course (Java Multithreading) I’m taking on Udemy. Nothing complicated.


 

There are normally three ways to create threads (Examples on gist):

  • Create a class that extends the Thread class
  • Create a class that implements the Runnable interface
  • Create a Thread anonymously

Whichever we choose to use, we must override or implement the public void run method.

Multithreading:

All Java programs have a main thread, but we can create and invoke other threads from the main thread.

To do that, we need to call the start() method of each thread we want to invoke from main thread. It will look for the run() method and run that in its own special thread, not in the main thread (refer to the App.java in the gist).

The start() method will return immediately so the main thread will continue its execution of the next line of code.

However, if we accidentally call the run() method of those threads, then the method run() will be executed in the main thread, not in its own special thread! So be careful.

 

Gas station

There are N gas stations along a circular route, where the amount of gas at station i is gas[i].

You have a car with an unlimited gas tank and it costs cost[i] of gas to travel from station i to its next station (i+1). You begin the journey with an empty tank at one of the gas stations.

Return the starting gas station’s index if you can travel around the circuit once, otherwise return -1.

Note:
The solution is guaranteed to be unique.

(Java Code on Github at the bottom of the post. )

Q: What is the unknown?

A: The index of the gas station from which we can circle around the route.

Q: What are the data?

A: There are N gas stations. At each station i (i = 0, 1, 2, 3, …, N-1) the amount of gas is gas[i]. The tank is unlimited. To get from station i to i+1, it needs cost[i] amount of gas. We start from an empty gas tank.

Q: Are there any constraints?

A: Yes, the solution is unique and if none satisfies the requirements, return -1.

Q: What do you think is the key point of this problem?

A: That whether we can circle around from a station with index i.

Q: And how would you like to do that?

A: Hmm, we start at index i with L amount of gas in the tank (initially L is zero). If L + gas[i] >= cost[i], that means we can move from index i to i + 1. We repeat this process until we meet the following cases:

  1. At station k % N (k % N < i), L + gas[k%N] < cost[k%N].  This means that we cannot circle around the route from station i.
  2. k % N == i. In this case we have circled around the route from station i and we should return i.

Based on this, we can think of a straightforward algorithm. At each station, check if we can circle around the route as stated above.

Q: OK. What is the time complexity of this algorithm?

A: If we are extremely unlucky, we may have to almost circle around the route every time we perform a station check. That takes O(N) time. Since we have to do it for all N stations, I would say this algorithm takes O(N^2) time.

Q: Can we do better than this?

A: Hmm, I don’t think we can improve the station check since we do have to go around all the stations in the circle. But maybe we don’t have to do a station check for each station in the circle. There might be a way to skip some of them based on the result of the previous station checks.

Q: That’s a good thought. Can you think of an example to prove your idea?

A: OK, suppose I’m at station i and I can reach to station j at most where i <= j. That means the left amount of gas L at station j plus gas[j] is less than cost[j]. But then what?

Q: OK, normally we would just go check station i+1 (suppose i + 1 <= j), right? Now can you deduct how far we can get starting from i+1 based on the result of station i?

A:  I’m not sure…

Q: When we are at station i + 1 with a fresh start, the amount of gas left in the tank L is zero. We also know that we can go from i to i + 1. So from i to i+1, the amount of gas left in the tank L at station i + 1 must be no less than zero (otherwise we cannot go from i to i + 1). In this case if we can at most get to station j from i, how far do you think we can get to from station i+1?

A: Eh…at most j as well. Ah I get it. If we can go from station i to station j but not circling around the route we should skip all stations from i + 1 to j and only start the next station check at station j + 1.

Q: You got it. So what about the running time of this algorithm?

A: Hmm, in this algorithm each station is checked at most once. So I would say it is an O(N) algorithm.

Q: OK. Go implement this algorithm.

Code on github:

 

Valid Parentheses

Valid Parentheses
Given a string containing just the characters ‘(‘, ‘)’, ‘{‘, ‘}’, ‘[‘ and ‘]’, determine if the input string is valid.
The brackets must close in the correct order, “()” and “()[]{}” are all valid but “(]” and “([)]” are not.

(Java Code on Github at the bottom of the post. )

Q: What is the unknown?
A: determine if the input string is valid.
Q: What does it mean by valid?
A: Parentheses must be in the correct order:
1. Each (, [, { must correspond to one ), ], }.
2. ), ], } cannot exist without a preceding (, [, {.
3. The parentheses must not intertwine like ([)], however {([])} is allowed.
Q: So these are also the constraints then. What about the data, input and output?
A: The input data is a string and the output is a boolean.
Q: Special cases we should pay attention to?
A: Null string, empty string, string with odd length, string with characters other than (), [] and {}. The algorithm should return false in these cases.

Q: OK. Have you seen it before? Or something similar?
A: Yes I’ve seen something similar. In that problem, using a Stack is helpful.
Q: And why is that?
A: Well, the validation actually happens when we meet one of the three characters ), ] and }. At that point, we need to check if there is a preceding (, [, or {. A stack helps us to keep track of what we have seen previously and easily get back to it.
Q: Fair enough. So how would you use the stack here? How does it operate?
A: Hmm, whenever we meet a (, [ or {, we push it into the stack. In other cases, we first check if the stack is empty. If yes, then return false. If no, then check the top character of the stack matches in pair with the current character. If yes, we pop the stack. Otherwise, return false.
Q: OK, then when do we return true?
A: When we finish scanning the string, if the stack is empty then we return true. We must check this otherwise it will fail on test case like “((“.
Q: Good. What is the complexity of this algorithm?
A: The time and space complexities are both O(n) where n is the length of the string since we only scan the string once but we need the stack to save the characters.

Code on github:

Update:
Q: can we do better?
A: well we’ve already achieved O(n) time complexity. I’m not sure if we can make it even faster. After all, we do have to check all characters.
Q: Then what about space? Is the stack really necessary?
A: Hmm, you are saying that we should use something else to keep track of previous characters?
Q: Yep, the characters are in the string, right? I can access them easily using index.
A: OK, I see where this is going. So instead of using a stack, we should consider using indices to keep track of previous left characters?
Q: Yeah.
A: Hmm, in that case, we need the index of the latest left character and its previous character and this shall apply to all seen left characters and we will need extra space to either create a wrapper class to hold the index and this will cost O(n) space too. So it’s not better.
Q: OK, fair enough. But it’s always good to think about possible improvement.
A: Of course.