January 26, 2014

Understanding Java 8 Streams API

Since past few versions, Java has started giving importance to concurrency. Java 8 goes one more step ahead and has developed a Streams API which lets us think about parallelism. Nowadays, because of the tremendous amount of development  on the hardware front, multicore CPUs are becoming more and more general. In order to leverage the hardware capabilities Java had introduced Fork Join Framework. Java 8 Streams API supports many parallel operations to process the data, while completely abstracting out the low level multithreading logic and letting the developer fully concentrate on the data and the operations to be performed on the data.

Most of us know, parallel processing is all about dividing a larger task into smaller sub tasks (forking), then processing the sub tasks in parallel and then  combining the results together to get the final output (joining). Java 8 Streams API provides a similar mechanism to work with Java Collections. The Java 8 Streams concept is based on converting Collections to a Stream, processing the elements in parallel and then gathering the resulting elements into a Collection.


Collections vs Streams:

Collections are in-memory data structures which hold elements within it. Each element in the collection is computed before it actually becomes a part of that collection. On the other hand Streams are fixed data structures which computes the elements on-demand basis. 

The Java 8 Streams can be seen as lazily constructed Collections, where the values are computed when user demands for it. Actual Collections behave absolutely opposite to it and they are set of eagerly computed values (no matter if the user demands for a particular value or not).





Deeper Look at Streams:

The Stream interface is defined in java.util.stream package. Starting from Java 8, the java collections will start having methods that return Stream. This is possible because of another cool feature of Java 8, which is default methods. Streams can be defiled as a sequence of elements from a source that supports aggregate operations.

The source here refers to a Collection, IO Operation or Arrays who provides data to a Stream. Stream keeps the order of the data as it is in the source.
Just like functional programming languages, Streams support Aggregate Operations. The common aggregate operations are filter, map, reduce, find, match, sort. These operations can be executed in series or in parallel.

The Streams also support Pipelining and Internal Iterations. The Java 8 Streams are designed in such a way that most of its stream operations returns Streams only. This help us creating chain of various stream operations. This is called as pipelining. The pipelined operations looks similar to a sql query. 

In Java, we traditionally use for loops or iterators to iterate through the collections. These kind of iterations are called as external iterations and they are clearly visible in the code. Java 8 Stream operations has methods like foreach, map, filter, etc. which internally iterates through the elements. The code is completely unaware of the iteration logic in the background. These kind of iterations are called as internal iterations.

List<String> names = new ArrayList<>();
for (Student student : students) {
    if(student.getName().startsWith("A")){
        names.add(student.getName());
    }
}
There is nothing special about this code. This is a traditional Java external iterations example. Now, have a look at the below code. This line is doing exactly the same thing but we can't see any iteration logic here and hence it is called as internal iterations.

List<string> names = students.stream().map(Student::getName).filter(name->name.startsWith("A"))
                                .collect(Collectors.toList());

Operations on Streams:

There are variety of operations defined in the Streams interface. Have a look at the below example. Here we are iterating through list of students, and selecting names of first 10 students whose names start with "A".


List<String> names = students.stream()
                            .map(Student::getName)
                            .filter(name->name.startsWith("A"))
                            .limit(10)
                            .collect(Collectors.toList());

In the above code there are few operations like map, filter, limit, and collect. We can categories these operations into Intermediate operations and Terminal Operations. 

The intermediate operations return streams and hence can be connected together to form a pipeline of operations. In above example map, filter, and limit are such intermediate operations. 

The terminal operations, as the name suggests reside at the end of such a pipeline and their task is to close the stream in some meaningful way. Terminal operations collect the results of various stream operations in the form of anything like lists, integers or simply nothing. If we have to print the name of students whose name starts with "A", the foreach operation will be our terminal operation that will print all the names from the filtered stream and will return nothing.


The most interesting part to know about the intermediate operations is that they are lazy. The intermediate operations will not be invoked until the terminal operation is invoked. This is very important when we are processing larger data streams. The process only on demand principle drastically improves the performance. The laziness of the intermediate operations help to invoke these operation in one pass. Now, if you are not clear with single pass, please wait until we dive into more details about Java 8 Streams during our subsequent discussions. 


Numerical Ranges:

Most of the times we need to perform certain operation on the numerical ranges. To help in such scenarios Java 8 Streams API has come up with three useful interfaces IntStream, DobuleStream, and LongStream.

IntStream.rangeClosed(1, 10).forEach(num -> System.out.print(num));
// ->12345678910

IntStream.range(1, 10).forEach(num -> System.out.print(num));
// ->123456789

All of the above mentioned interfaces support range and rangeClosed methods. range method is exclusive while rangeClosed is inclusive. 
Both of these methods return stream of numbers and hence can be used as intermediate operations in a pipeline.


Building Streams:

By now, we have had a quick overview of what is Java 8 Stream and how useful it is. We have seen Java Collections can generate streams of the data contained within them, we have also seen how to get streams of numerical ranges. But creating Streams is not limited to this, there are many other ways by which streams can be generated.

Using the 'Of' method we can created stream of hardcoded values. Suppose we want stream of hardcoded Strings, just pass all of the Strings to the 'of' method.

Suppose we want to create a stream of all the elements in an array, we can do so by calling stream method on Arrays. Arrays is a traditional utility class which now has a support for stream methods. 

//Creating Stream of hardcoded Strings and printing each String
Stream.of("This", "is", "Java8", "Stream").forEach(System.out::println);

//Creating stream of arrays
String[] stringArray = new String[]{"Streams", "can", "be", "created", "from", "arrays"};
Arrays.stream(stringArray).forEach(System.out::println);
        
//Creating BufferedReader for a file
BufferedReader reader = Files.newBufferedReader(Paths.get("File.txt"), StandardCharsets.UTF_8);
//BufferedReader's lines methods returns a stream of all lines
reader.lines().forEach(System.out::println);

Recently added NIO API as well as the traditional IO API have been updated to support the streams. This provides very useful abstraction of directly creating streams of lines being read from a file.

Java 8 Streams is a completely new to Java and it is a very large concept, and it is difficult to cover it completely on this platform. That doesn't mean our discussion on streams ends here. Till now we have seen what are the Java 8 Streams, how the existing APIs have been updated to support Streams, brief about various methods on streams, and how to build streams.

We will keep this discussion alive in subsequent posts. We are yet to look more inside streams, various intermediate stream operations, and ways of collecting computed data with the help of various terminal operations. Thanks!



Other Java 8 and Streams API Articles: