January 31, 2014

Java 8 Streams API - Laziness and Performance Optimization

Greetings!

We have had a quick overview of Java 8 Streams API in the last post. We looked into the Power and simplicity of the Java 8 Streams API, brief about the Intermediate and the Terminal Operations over the streams, and different ways of building the streams (e.g from collections or numerical ranges etc.). In continuation to the same discussion, in this post, we will move ahead with the streams and have a look at the most important property of Java 8 Streams that is Laziness.
If you are new to the concept of Java 8 streams, please go back and read Understanding Java 8 Streams API.




Laziness Improves Performance (?):

This is really a tricky question. If the laziness is utilized in a right manner, the answer is 'yes'. Consider you are on an online shopping site and you searched for a particular type of a product. Usually most of the websites will show few of the matching products immediately and a 'loading more' message at the bottom. Finally, all of the search results will be loaded in parts, as described. The intent behind doing this is to keep the user interested by immediately showing him some of the results. While the user is browsing through the loaded products, the rest of the products are being loaded. This is because, the site is delaying the complete loading of the entire product list. Consider, if the site does eager loading or early loading of all of the products, the response time would increase and the user might get distracted to something else.

While you are dealing with bigger data, or infinite streams the laziness is a real boon. When the data is processed, we are not sure how the processed data will be used. The eager processing will always process the entire amount of data at the cost of performance and client might end up utilizing very small chunk of it, or depending upon some condition, client may not even need to utilize that data. The lazy processing is based on 'process only on demand' strategy.





Laziness and Java 8 Streams:


The current era is all about Big Data, Parallel Processing, and Being Real Time. Large number of systems are being re-designed to sustain in the future challenges of the consistently growing amount of data, and high expectations of the performance and scalability. No wonder, if the processing model of the Java Collections API is being empowered in order to meet the future expectations. The Java 8 Streams API is fully based on the 'process only on demand' strategy and hence supports laziness

In the Java 8 Streams API, the intermediate operations are lazy and their internal processing model is optimized to make it being capable of processing the large amount of data with high performance. Let's see it live in with an example. 



//Created a Stream of a Students List
//attached a map operation on it
Stream<String> streamOfNames = students.stream()
                .map(student -> {
                    System.out.println("In Map - " + student.getName());
                    return student.getName();
                });
//Just to add some delay
for (int i = 1; i <= 5; i++) {
    Thread.sleep(1000);
    System.out.println(i + " sec");
}
//Called a terminal operation on the stream
streamOfNames.collect(Collectors.toList());

Output:
1 sec
2 sec
3 sec
4 sec
5 sec
In Map - Tom
In Map - Chris
In Map - Dave

Here there is a map operation called up on a stream then we are putting a delay of 5 seconds and then a collect operation (Terminal Operation) is called. To demonstrate the laziness, we have put a delay of 5 seconds. The output put clearly shows the map operation was called after calling the collect method only. Think of the collection operations created at one place and probably never used in the entire program. Java 8 Streams do not process the collection operations until user actually starts using it.



Performance Optimization:


As discussed above, the internal processing model of streams is designed in order to  optimize the processing flow. In the processing flow we usually create a pipe of various intermediate operations  and a terminal operation in the end. Because of the streams and the optimization considerations given to the processing model, the various intermediate operations can be clubbed and processed in a single pass.



List<String> ids = students.stream()
                .filter(s -> {System.out.println("filter - "+s); return s.getAge() > 20;})
                .map(s -> {System.out.println("map - "+s); return s.getName();})
                .limit(3)
                .collect(Collectors.toList());

Output:
filter - 8
map - 8
filter - 9
map - 9
filter - 10
filter - 11
map - 11

The above example demonstrates this behavior, where we have two intermediate operations namely map and filter. The output shows, neither the map nor the filter is executed independently over the entire size of the available stream. First, the id - 8 passed the filter and immediately moved to the map. Same is the case for the id - 9, while id - 10 didn't pass the filter test. We can see id - 8, once passed through the filter was immediately available to the map operation, no matter how many elements are still lined in the stream before the filter operation.



Short Circuit Methods:


Java 8 Streams API optimizes stream processing with the help of short circuiting  operations. Short Circuit methods ends the stream processing as soon as their conditions are satisfied. In normal words short circuit operations, once the condition is satisfied just breaks all of the intermediate operations, lying before in the pipeline. Some of the intermediate as well as terminal operations have this behavior.

To see it working, try the below example, where there is a list of String names. The first stream operation is (actually meaningless) map, which returns name in upper case. The second operation is filter which returns only names starting with "B". Now somewhere down the line, if we normally call the collect operation over it, no wonder if the map and filter are seen processing all the names in the list (and it exactly works like that). 



//List of names
List<String> names = Arrays.asList(new String[]{"barry", " andy", "ben", "chris", "bill"});
        
//map and filter are piped and the stream is stored
Stream<String> namesStream = names.stream()
                .map(n -> {System.out.println("In map - " + n); return n.toUpperCase();})
                .filter(upperName -> {System.out.println("In filter - " + upperName); 
                       return upperName.startsWith("B");});

But instead of this if we put a limit operation before the collect, the output changes dramatically. 



//Somewhere down the line
//Just want two names from the steram
namesStream.limit(2).collect(Collectors.toList());

Output:
In map - barry
In filter - BARRY
In map -  andy
In filter -  ANDY
In map - ben
In filter - BEN

We can clearly see the limit (though it is called lately from some other place and it is the last intermediate operation in the pipe) has an influence over the map and filter operations. The entire pipe says, we want first two names who starts with a letter "B". As soon as the pipe processes the first two names starting with "B", the map and filter didn't even process the rest of the names.

Now, this can turn out to be a very huge performance gain. Consider, if our list contains few thousand names and we just want the first couple of names matching to a certain filter condition, processing of the rest of the elements will simply be skipped once we get the intended elements.

 The operations like anyMatch, allMatch, noneMatch, findFirst, findAny, limit, and substream are such short-circuit methods in the Steams API.


Friends! it's time for a break now. We'll leave off this discussion here and soon will be back with few more interesting thing about the Java 8 Streams API. Take care.


Other Java 8 and Streams API Articles: