Streams

Introduction

Collections, lists or sets are part of the everyday work of a Java programmer. There is no software that is implemented without one of these data structures. The classic way to deal with these data structures are loops.

Unfortunately, loops have the disadvantage that they make the code a little confusing. The programmer has to analyze the source code line by line to understand what has been implemented.

The Streams API was introduced with Java 8. Streams allow the programmer to write source code that is readable and thus also more maintainable.

A stream is a sequence of objects on which certain methods can be executed. We will get to know which method this is exactly below.

Characteristics of Streams

Streams are not related to InputStreams, OutputStreams, etc.
Streams are NOT data structures but are wrappers around Collection that carry values from a source through a pipeline of operations.
Streams are more powerful, faster and more memory efficient than Lists
Streams are designed for lambdas
Streams can easily be output as arrays or lists
Streams employ lazy evaluation
Streams are parallelizable
Streams can be “on-the-fly”

Creating Streams

a stream can be created by using one of the following ways:

From individual values
- Stream.of(val1, val2, …)
From array
- Stream.of(someArray)
- Arrays.stream(someArray)
From List (and other Collections)
- someList.stream()
- someOtherCollection.stream()

ParallelStreams

In addition to the classic streams, Java 8 also offers the parallel streams. As the name suggests, ParallelStreams are streams that are processed in parallel. This has the great advantage that you don't have to worry about multithreading yourself. The Stream API does this for us.

The following source code shows the use of Streams and ParallelStreams.

List list = Arrays.asList("Customer 1", "Customer 2", "Customer 3", "Customer 4", "Customer 5");
list.stream().forEach(s -> System.out.println("Stream: " + s + " - " + Thread.currentThread()));
list.parallelStream().forEach(s -> System.out.println("ParallelStream: " + s + " - " + Thread.currentThread()));

In the following comes another example to show how to use ParallelStreams for more performance.

private static void timingTest(Stream<Employee> testStream) {
  long startTime = System.nanoTime();
  testStream.forEach(e -> doSlowOp());
  long endTime = System.nanoTime();
  System.out.printf(" %.3f seconds.%n",
  deltaSeconds(startTime, endTime));
}
private static double deltaSeconds(long startTime,  long endTime) {
  return((endTime - startTime) / 1000000000);
}
void doSlowOp() {
  try {
    TimeUnit.SECONDS.sleep(1);
  } catch (InterruptedException ie) {
    // Nothing to do here.  
  }
}
void main() {
  System.out.print("Serial version [11 entries]:");
  timingTest(googlers());
  int numProcessorsOrCores =
  Runtime.getRuntime().availableProcessors();
  System.out.printf("Parallel version on %s-core machine:",
  numProcessorsOrCores);
  timingTest(googlers().parallel() );
}

The results of the doSlowOp functionality for the Serial version (11 entries) is about 11.000 seconds; and for the parallel version on 4-core machine is about 3.000 seconds.

Common Functional Interfaces

`Predicate<T>`

Represents a predicate (boolean-valued function) of one argument
Functional method is boolean Test(T t)
- Evaluates this Predicate on the given input argument (T t)
- Returns true if the input argument matches the predicate, otherwise false

`Supplier<T>`

Represents a supplier of results
Functional method is T get()
- Returns a result of type T

`Function<T,R>`

Represents a function that accepts one argument and produces a result
Functional method is R apply(T t)
- Applies this function to the given argument (T t)
- Returns the function result

`Consumer<T>`

Represents an operation that accepts a single input and returns no result
Functional method is void accept(T t)
- Performs this operation on the given argument (T t)

`UnaryOperator<T>`

Represents an operation on a single operands that produces a result of the same type as its operand
Functional method is R Function.apply(T t)
- Applies this function to the given argument (T t)
- Returns the function result

`BiFunction<T,U,R>`

Represents an operation that accepts two arguments and produces a result
Functional method is R apply(T t, U u)
- Applies this function to the given arguments (T t, U u)
- Returns the function result

`BinaryOperator<T>`

Extends BiFunction
Represents an operation upon two operands of the same type, producing a result of the same type as the operands
Functional method is R BiFunction.apply(T t, U u)
- Applies this function to the given arguments (T t, U u) where R,T and U are of the same type
- Returns the function result

`Comparator<T>`

Compares its two arguments for order.
Functional method is int compareTo(T o1, T o2)
- Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.

Stream Pipeline

A Stream is processed through a pipeline of operations and starts with a source data structure.

Intermediate methods are performed on the Stream elements. These methods produce Streams and are not processed until the terminal method is called.

The Stream is considered consumed when a terminal operation is invoked. No other operation can be performed on the Stream elements afterwards.

A Stream pipeline contains some short-circuit methods (which could be intermediate or terminal methods) that cause the earlier intermediate methods to be processed only until the short-circuit method can be evaluated.

The most important common methods of streams are:

Intermediate Methods
- map, filter, distinct, sorted, peek, limit, parallel
Terminal Methods
- forEach, toArray, reduce, collect, min, max, count, anyMatch, allMatch, noneMatch, findFirst, findAny, iterator
Short-circuit Methods
- anyMatch, allMatch, noneMatch, findFirst, findAny,limit

`Optional<T> Class`

A container which may or may not contain a non-null value. Its common methods are: * isPresent() – returns true if value is present * Get() – returns value if present * orElse(T other) – returns value if present, or other * ifPresent(Consumer) – runs the lambda if value is present

Common Stream API Methods

`Void forEach(Consumer)`

It is an easy way to loop over Stream elements. You supply a lambda for forEach and that lambda is called on each element of the Stream. Related peek method does the exact same thing, but returns the original Stream.

// Give all employees a 10%     raise
Employees.forEach(e -> e.setSalary(e.getSalary() * 11/10))

The same code using the loop:

List<Employee> employees = getEmployees();
for(Employee e: employees) {
    e.setSalary(e.getSalary() * 11/10);
}

`Stream<T> map(Function)`

map produces a new Stream that is the result of applying a Function to each element of original Stream For example:

Ids.map(EmployeeUtils::findEmployeeById)
// Create a new Stream of Employee ids

`Stream<T> filter(Predicate)`

filter produces a new Stream that contains only the elements of the original Stream that pass a given test For example:

employees.filter(e -> e.getSalary() > 100000)
// Produce a Stream of Employees with a     high salary

`Optional<T> findFirst()`

findFirst returns an Optional for the first entry in the Stream. For example:

employees.filter(…).findFirst().orElse(Consultant)
// Get the first Employee entry that passes the filter

`Object[] toArray(Supplier)`

toArray reads the Stream of elements into a an array. For example:

Employee[] empArray = employees.toArray(Employee[]::new);
// Create an array of Employees out of the Stream of Employees

`List<T> collect(Collectors.toList())`

collect reads the Stream of elements into a List or any other collection. For example:

List<Employee> empList =
employees.collect(Collectors.toList());
// Create a List of Employees out of the Stream of Employees

`collect partitioningBy`

You provide a Predicate. It builds a Map where true maps to a List of entries that passed the Predicate, and false maps to a List that failed the Predicate. For example:

Map<Boolean,List<Employee>> richTable = 
    googlers().collect(partitioningBy(e -> e.getSalary() > 1000000));

`collect groupingBy`

You provide a Function. It builds a Map where each output value of the Function maps to a List of entries that gave that value. For example:

Map<Department,List<Employee>> deptTable =
    employeeStream().collect(groupingBy(Employee::getDepartment));

`T reduce(T identity, BinaryOperator)`

You start with a seed (identity) value, then combine this value with the first Entry in the Stream, combine the second entry of the Stream, etc. For example:

Nums.stream().reduce(1, (n1,n2) -> n1*n2)
// Calculate the product of numbers

IntStream (Stream on primative int) has build-in sum(), Min and Max methods.

`Stream<T> limit(long maxSize)`

Limit(n) returns a stream of the first n elements. For example:

someLongStream.limit(10)
// returns first 10 elements

`Stream<T> skip(long n)`

skip(n) returns a stream starting with element n. For example:

twentyElementStream.skip(5)
// returns last 15 elements

`Stream<T> sorted(Comparator)`

Returns a stream consisting of the elements of this stream, sorted according to the provided Comparator For example:

empStream.map(…).filter(…).limit(…).sorted((e1, e2) -> e1.getSalary() - e2.getSalary())
// Employees sorted by salary

`Optional<T> min(Comparator)`

Returns the minimum element in this Stream according to the Comparator For example:

Employee alphabeticallyFirst =
ids.stream().map(EmployeeSamples::findGoogler)
.min((e1, e2) -> e1.getLastName().
compareTo(e2.getLastName())).get();
// Get Googler with earliest lastName

`Optional<T> max(Comparator)`

Returns the minimum element in this Stream according to the Comparator. For example:

Employee richest =
ids.stream().map(EmployeeSamples::findGoogler)
.max((e1, e2) -> e1.getSalary() - e2.getSalary()).get();
// Get Richest Employee

`Stream<T> distinct()`

Returns a stream consisting of the distinct elements of this stream. For example:

List<Integer> ids2 =
Arrays.asList(9, 10, 9, 10, 9, 10);
List<Employee> emps4 =
ids2.stream().map(EmployeeSamples::findGoogler)
.distinct().collect(toList());
// Get a list of distinct Employees

`long count()`

Returns the count of elements in the Stream. For example:

employeeStream.filter(somePredicate).count()
// How many Employees match the criteria?

`Boolean anyMatch(Predicate)`

anyMatch returns true if Stream passes, false otherwise. It processes elements in the Stream one element at a time until it finds a match according to the Predicate and returns true if it found a match.

`Boolean allMatch(Predicate)`

allMatch returns true if Stream passes, false otherwise. It processes elements in the Stream one element at a time until it fails a match according to the Predicate and returns false if an element failed the Predicate.

`Boolean noneMatch(Predicate)`

noneMatch returns true if Stream passes, false otherwise. It processes elements in the Stream one element at a time until it finds a match according to the Predicate and returns false if an element matches the Predicate.

For example:

employeeStream.anyMatch(e -> e.getSalary() > 500000)
// Is there a rich Employee among all Employees?

Filter-Map-Reduce has some analogies to Google's MapReduce algorithm, but is not the same (See also Wikipedia). Filter-Map-Reduce describes a pattern in which a quantity of data is processed in a sequence of specific steps: * Filter: filtering of desired elements from a set of elements. The element types remain the same, but the number of elements is reduced. Example: Stream.filter (Predicate). * Map: transformation of the elements. The number of elements does not change, but the element type often changes. For example, calculations, extractions or conversions can be carried out. Example: Stream.map (Function). * Reduce: reduction to an end result. Examples: Stream.forEach (Consumer), Stream.collect (Collector) and Stream.reduce (BinaryOperator).

Examples for Filter-Map-Reduce (display the result with System.out.println(...)):

Initialization

class book {
   String title; String author; int year; double price;
   public book (string title, string author, int year, double price) {
      this.titel = title; this.autor = author;
      this.year = year; this.price = price;
   }
}
List <Book> myBookList = Arrays.asList (
      new book ("Fortran", "Ferdinand", 1957, 57.99),
      new book ("Java in 3 Tage", "Anton", 2005, 11.99),
      new book ("Java in 4 Tage", "Berta", 2005, 22.99),
      new book ("Filter-Map-Reduce with Lambdas", "Caesar", 2014, 33.99));

Filter certain book objects, e.g. all from 2004,
folder or extract the desired information, e.g. the author name,
Reduce to a result string, e.g. Comma-separated names.

String s = myBookList.stream ().filter (b -> b.year> = 2004)
      .map (b -> b.autor).reduce ("", (s1, s2) -> (s1.isEmpty ())? s2: s1 + "," + s2);

Grouping, e.g. by year, as well as counting per group.

Map <Integer, Long> m = myBookList.stream ().filter (b -> b.year> = 2004)
      .collect (Collectors.groupingBy (b -> Integer.valueOf (b.jahr),Collectors.counting ()));

Reduce with the aggregate function, e.g. on the average.

OptionalDouble d = myBookList.stream ()
      .filter (b -> b.year> = 2000).mapToDouble (b -> b.price).average ();

Stream generation with Files.lines (), mapping to number of characters per line and reduction to a total value.

long number of characters = Files.lines (Paths.get ("MyTextFile.txt"),StandardCharsets.UTF_8)
      .mapToInt (String :: length).sum ();

Stream generation with Files.lines(), generation of "sub-streams" per line with Pattern.splitAsStream() and merging of the streams again with Stream.flatMap(), as well as reduction to the number of elements with Stream.count() .

long numberWords = Files.lines (Paths.get ("MyTextFile.txt"), StandardCharsets.UTF_8)
      .flatMap (Pattern.compile ("[^\\p{L}]") :: splitAsStream).count ();

Exercise

Exercise:
- Write a method that returns the average of a list of integers.
Exercise:
- Write a method that converts all strings in a list to their upper case.
Exercise:
- Given a list of strings, write a method that returns a list of all strings that start with the letter 'a' (lower case) and have exactly 3 letters.
Exercise:
- Write a method that returns a comma-separated string based on a given list of integers. Each element should be preceded by the letter 'e' if the number is even, and preceded by the letter 'o' if the number is odd. For example, if the input list is (3,44), the output should be 'o3,e44'.

Solution 1:

public Double average(List<Integer> list) {
return list.stream()
  .mapToInt(i -> i)
  .average()
  .getAsDouble();
}

Solution 2:

public List<String> upperCase(List<String> list) {
return list.stream()
  .map(String::toUpperCase)
  .collect(Collectors.toList());
}

Solution 3:

public List<String> search(List<String> list) {
return list.stream()
  .filter(s -> s.startsWith("a"))
  .filter(s -> s.length() == 3)
  .collect(Collectors.toList());
}

Solution 4:

public String getString(List<Integer> list) {
return list.stream()
  .map(i -> i % 2 == 0 ? "e" + i : "o" + i)
  .collect(joining(","));
}