Java 8 Grouping with Collectors | groupingBy method tutorial with examples
Introduction
Java 8 Grouping with Collectors tutorial explains how to use the predefined Collector returned by
The tutorial begins with explaining how grouping of stream elements works using a Grouping Collector . The concept of grouping is visually illustrated with a diagram. Next, the three overloaded
Grouping collectors use a classification function , which is an instance of the
All these R-values and corresponding Collection of stream objects are stored by the grouping collector in a
The process of grouping, starting from the application of classification function on the stream elements, till the creation of Map containing the grouped elements, is as shown in the diagram below - In the above diagram, the elements of
Having understood now the concept of grouping with collectors, let us now see how to implement grouping collectors in code using the 3 overloaded
Where,
- input is
- output is a Collector with finisher Click to Read tutorial on 4 components of Collectors incl. 'finisher' (return type) as a
The simplest variant of
Variant #1 of grouping collector - Java Example Lets say we have a stream of
groupingBy()
method of
java.util.stream.Collectors
class with examples.
The tutorial begins with explaining how grouping of stream elements works using a Grouping Collector . The concept of grouping is visually illustrated with a diagram. Next, the three overloaded
groupingBy()
methods in
Collectors
class are explained using their method definitions, Java code examples showing the 3 methods in action and explanations for the code examples. Lastly, a brief overview of the concurrent versions of the three
groupingBy()
methods is provided.
(Note - This tutorial assumes that its readers are familiar with the basics of
Java 8 Collectors
Read Tutorial explaining basics of Java 8 Collectors
.)
Understanding the concept of 'grouping' using Collectors
Given a stream of objects, there are scenarios where these objects need to be grouped based on a certain distinguishing characteristic they posses. This concept of grouping is the same as the 'group by' clause in SQL which takes an attribute, or a calculated value derived from attribute(s), to divide the retrieved records in distinct groups. Generally, in imperative style of programming, such grouping of records(objects in OOPS) involves iterating over each object, checking which group the object being examined falls in, and then adding that object in its correct group. The group itself is held together using a
Collection
instance. Java 8's new functional features allow us to do the same grouping of objects in a declarative way, which is typical of functional rather than
imperative
Click to Read tutorial explaining how functional & imperative programming styles differ
style of programming, using Java 8's new
Grouping Collector
.
Grouping collectors use a classification function , which is an instance of the
Function<T,R>
functional interface, which for every object of type
T
in a stream, returns a classifier object of type
R
. Various values of R, finite in number, are the 'group names' or 'group keys'. As the grouping collector works on the stream of objects its collecting from it creates collections of stream objects corresponding to each of the 'group keys'. I.e. for every value of
R
there is a collection of objects all of which return that value of
R
when subjected to the classification function.
All these R-values and corresponding Collection of stream objects are stored by the grouping collector in a
Map<R, Collection<T>>
, i.e. each
‘key,value’
entry in the map consists of
‘R,Collection<T>’
.
The process of grouping, starting from the application of classification function on the stream elements, till the creation of Map containing the grouped elements, is as shown in the diagram below - In the above diagram, the elements of
Stream<T>
are grouped using a classification function returning 4 values of
R
-
r
1
,r
2
,r
3
,r
4
. The grouped elements are stored in a
Map<R,Collection<T>>
, with the 4 values of
R
being used as 4
keys
pointing to 4 corresponding
collections
stored in the
Map
. These
Collection
instances hold the individual grouped elements, which is the required output from the grouping collector.
Having understood now the concept of grouping with collectors, let us now see how to implement grouping collectors in code using the 3 overloaded
groupingBy()
method variants provided in
Collectors
class, starting from the simplest variant which creates a
List
of the grouped elements.
Variant #1 of Collectors.groupingBy() method - stores grouped elements in a List
The simplest of
Collectors.groupingBy()
method variants is defined with the following signature -
public static <T, K> Collector<T, ?, Map<K, List<T>>> groupingBy(Function<? super T, ? extends K> classifier)
- input is
classifier
which is an instance of a
Function
Click to read detailed tutorial on Function Functional Interfaces
functional interface which converts from type T to type K.
- output is a Collector with finisher Click to Read tutorial on 4 components of Collectors incl. 'finisher' (return type) as a
Map
with entries having ‘
key,value
’ pairs as ‘
K, List<T>
’
The simplest variant of
groupingBy()
method applies classifier
Function<T,R>
to each individual element of type
T
collected from
Stream<T>
. It then groups elements into individual lists based on the value of
R
they return on application of classifier function, and stores them in a
Map<R,List<T>>
, using the process we had understood in the previous section explaining how a
grouping collector
operates.
Variant #1 of grouping collector - Java Example Lets say we have a stream of
Employee
objects, belonging to a company, who need to be grouped by their departments, with their
Department
present as an attribute in the
Employee
object. As the end result of applying the grouping collector for achieving this we want a
Map
with keys as departments and corresponding values as
List
of employees in that department. Diagrammatically such as an implementation would be represented as shown below -
In the above diagram, employees are grouped into 4 departments - HR, OPERATIONS, LEGAL and MARKETING. Let us now see the Java code for implementing the above 'Department - Employees' use case, followed by its explanation.
Java 8 code example for Variant #1 of Collectors.groupingBy()
package com.javabrahman.java8.collector;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.stream.Collectors;
public class GroupingWithCollectors {
static List<Employee> employeeList = Arrays.asList(
new Employee("Tom Jones", 45, 12000.00,Department.MARKETING),
new Employee("Harry Major", 26, 20000.00, Department.LEGAL),
new Employee("Ethan Hardy", 65, 30000.00, Department.LEGAL),
new Employee("Nancy Smith", 22, 15000.00, Department.MARKETING),
new Employee("Catherine Jones", 21, 18000.00, Department.HR),
new Employee("James Elliot", 58, 24000.00, Department.OPERATIONS),
new Employee("Frank Anthony", 55, 32000.00, Department.MARKETING),
new Employee("Michael Reeves", 40, 45000.00, Department.OPERATIONS));
public static void main(String args[]){
Map<Department,List<Employee>> employeeMap
= employeeList.stream().collect(Collectors.groupingBy(Employee::getDepartment));
System.out.println("Employees grouped by department");
employeeMap.forEach((Department key, List<Employee> empList) -> System.out.println(key +" -> "+empList));
//Employee.java - POJO Class
package com.javabrahman.java8.collector;
public class Employee {
private String name;
private Integer age;
private Double salary;
private Department department;
public Employee(String name, Integer age, Double salary, Department department) {
this.name = name;
this.age = age;
this.salary = salary;
this.department = department;
// Setters/Getters for name,age,salary,department go here
public String toString(){
return "Employee Name:"+this.name;
//Standard equals and hashcode implementations go here
//Enum Department.java
package com.javabrahman.java8.collector;
public enum Department {
HR, OPERATIONS, LEGAL, MARKETING
Employees grouped by department
HR -> [Employee Name:Catherine Jones]
LEGAL -> [Employee Name:Harry Major, Employee Name:Ethan Hardy]
OPERATIONS -> [Employee Name:James Elliot, Employee Name:Michael Reeves]
MARKETING -> [Employee Name:Tom Jones, Employee Name:Nancy Smith, Employee Name:Frank Anthony]
Explanation of the code
Employee
is the POJO class in the above example of which we create a Stream. It has four attributes - name
, age
, department
and salary
.
Department
is an Enum
with the following values - HR
, OPERATIONS
, LEGAL
, MARKETING
.
employeeList
is a static list of 8 Employee
s.
In the main()
method of GroupingWithCollectors
class we create a Stream
of Employee
s using the stream()
method of List
interface.
On the stream of Employee
s we call the collect()
method with predefined Collector
returned by Collectors.groupingBy()
method as the parameter.
The classification function
passed to groupingBy()
method is the method referenceClick to Read Tutorial on Java 8's Method References to Employee.getDepartment()
method specified as "Employee::getDepartment"
.
Lastly, the Map
of employees grouped by department is printed using Map.forEach()
method. The output is as expected - map contains entries of ‘key,value’
in the form of ‘Department, List<Employee>’
with an entry for containing a Department
as key
having the List
of Employee
s of that Department
stored as value.
Variant #2 of Collectors.groupingBy()- uses a user specified Collector to collect grouped elements
Whereas the 1st variant always returned a List
containing the elements of a group, the 2nd variant of grouping collector provides the flexibility to specify how the grouped elements need to be collected using a second parameter which is a Collector
. So, instead of just storing the groups in resultant Map
as Lists
, we can instead store them in say Sets
, or find the maximum value in each group and store it rather than storing all the elements of a group, or any such collector operation which is applicable on the stream elements.
The 2nd variant of grouping collector is defined with the following signature -
Collector<T, ?, Map<K, D>> groupingBy(Function<? super T, ? extends K> classifier,
Collector<? super T, A, D> downstream)
Where,
- 1st input parameter is classifier
which is an instance of a FunctionClick to read detailed tutorial on Function Functional Interfaces functional interface which converts from type T
to type K
.
- 2nd input parameter is downstream
collector which collects the grouped elements into type D
, where D
is the specified finisherClick to Read tutorial on 4 components of Collectors incl. 'finisher'.
- output is a Collector
with finisher(return type) as a Map
with entries having ‘key,value’ pairs as ‘K, D
’
How variant#1 and variant#2 of grouping collector are closely related
In the Collectors class' code, the first variant of grouping Collector which accepts just the classification function as input does not itself return the Collector which processes the Stream elements. Instead, internally it delegates the call forward to the second variant with the call - groupingBy(classifier, toList())
. So, first variant of grouping collector is thus just a convenient way of invoking the second variant with the downstream collector 'hardcoded' as a List
.
Let us now see the 2nd variant of grouping collector in action with a Java code example.
Variant #2 of grouping collector - Java Example
This example for variant#2 uses the same use case of employees being grouped as per their department but this time instead of storing the grouped elements in a List
, we will instead store them inside a Set
in the resultant Map
.
(Note - The Employee
class and employeeList
objects with their values remain the same as the previous code usage example and hence are not shown below for brevity.)
Java 8 code example for VARIANT #2 of Collectors.groupingBy()
public static void main(String args[]){
Map<Department,Set<Employee>> employeeMap
= employeeList.stream()
.collect(Collectors.groupingBy(Employee::getDepartment, Collectors.toSet()));
System.out.println("Employees grouped by department");
employeeMap.forEach((Department key, Set<Employee> empSet) -> System.out.println(key +" -> "+empSet));
OUTPUT of the above code
Employees grouped by department
HR -> [Employee Name:Catherine Jones]
LEGAL -> [Employee Name:Harry Major, Employee Name:Ethan Hardy]
OPERATIONS -> [Employee Name:James Elliot, Employee Name:Michael Reeves]
MARKETING -> [Employee Name:Tom Jones, Employee Name:Nancy Smith, Employee Name:Frank Anthony]
Explanation of the code
The code above is 'nearly' the same as the code for 1st variant of grouping collector. The main difference is that Collectors.grouping()
method is now passed a second parameter - Collectors.toSet()
- which tells the grouping collector to collect the grouped values in individual Sets
.
The output with employees grouped in Sets
looks the same as 1st variant’s output as individual set elements are enclosed between square brackets -'[]' - just like they were for Lists
. But, if you look closely at the code then you will find that the employeeMap.forEach()
method call now has a Set<Employee>
specified as the type of value
rather than a List
which was the case in the 1st variant.
Variant #3 of Collectors.groupingBy()- with user specified Supplier function for Map creation and Collector to collect grouped elements
Whereas the 1st variant always returned a List
containing the elements of a group, the 2nd variant of grouping collector provides the flexibility to specify how the grouped elements need to be collected, the 3rd variant adds the capability to specify how the Map which holds the result is created. So, using the 3rd variant of grouping Collector
it can be specified whether the resultant Map
containing the grouped values is a HashMap
or a TreeMap
, or some user specified type of Map
.
The 3rd variant of grouping collector is defined with the following signature -
Collector<T, ?, M> groupingBy(Function<? super T, ? extends K> classifier, Supplier<M> mapFactory, Collector<? super T, A, D> downstream)
Where,
- 1st input parameter is classifier
which is an instance of a FunctionClick to read detailed tutorial on Function Functional Interfaces functional interface which converts from type T
to type K
.
- 2nd input parameter is Supplier<M>Click to read detailed tutorial on Supplier Functional Interfaces which is a factoryClick to Read Tutorial on Factory Design Pattern supplying Maps
of type M
.
- 3rd input parameter is downstream
collector which collects the grouped elements into type D
, where D
is the specified finisherClick to Read tutorial on 4 components of Collectors incl. 'finisher'.
- output is a Collector
with finisher(return type) as a Map
with entries having ‘key,value’ pairs as ‘K, D
’
How variant#2 and variant#3 of grouping collector are closely related
In the Collectors class' code, the second variant of grouping Collector which accepts the classification function along with downstream collector as input does not itself return the collector which processes the stream elements. Instead, internally it delegates the call forward to the third variant with the call - groupingBy(classifier, HashMap::new, downstream);
. So, second variant of grouping collector is thus just a convenient way of invoking the third variant with the Map
factory Supplier
'hardcoded' as HashMap::new
.
Going back a bit, we said something similar about the first and second groupingBy()
variants as well. Thus, we actually have a transitive kind of relationship between the three variants. Variant #1 calls variant #2 with downstream collector hardcoded, and variant #2 calls variant #3 with Map Supplier factory hardcoded. Inferring transitively, we can now say that variant #1 actually calls variant #3 with both the downstream collector and Map Supplier factory hardcoded.
Fortunately, the transitive offloading/delegation between variants ends at variant #3 which actually contains the entire collector logic for a grouping collector.
Let us now see a Java code example showing how to use the 3rd variant of grouping collector.
Variant #3 of grouping collector - Java Example
This example for variant #3 uses the same use case of employees being grouped as per their department. However, this time we will store the grouped elements in a Set
and tell the grouping collector to store the grouped employees in a TreeMap
instance instead of the default HashMap
instance that was internally hardcoded in variant #2.
(Note - The Employee
class and employeeList
objects with their values remain the same as the previous code usage example and hence are not shown below for brevity.)
Java 8 code example for VARIANT #3 of Collectors.groupingBy()
public static void main(String args[]){
Map<Department,Set<Employee>> employeeMap
= employeeList.stream()
.collect(Collectors.groupingBy(Employee::getDepartment, TreeMap::new, Collectors.toSet()));
System.out.println("Employees grouped by department");
employeeMap.forEach((Department key, Set<Employee> empSet) -> System.out.println(key +" -> "+empSet));
OUTPUT of the above code
Employees grouped by department
HR -> [Employee Name:Catherine Jones]
OPERATIONS -> [Employee Name:James Elliot, Employee Name:Michael Reeves]
LEGAL -> [Employee Name:Harry Major, Employee Name:Ethan Hardy]
MARKETING -> [Employee Name:Tom Jones, Employee Name:Nancy Smith, Employee Name:Frank Anthony]
Explanation of the code
The code above is 'nearly' the same as the code for 2nd variant of grouping collector. The main difference is that Collectors.grouping()
method is now passed a third parameter as well - TreeMap::new()
- which tells the grouping collector to collect the grouped values in an instance of a TreeMap
.
The output with employees grouped in Sets
corresponding to their departments is similar to what we saw in the java examples for 1st and 2nd variants. However, this time the department names, which are the keys of the result Map
, are arranged in alphabetical order which was not the case in the previous outputs. This alphabetical ordering is because of the use of TreeMap
this time which automatically sorts its entries based on the natural ordering of its keys.
Concurrent versions of grouping collector
We saw three groupingBy()
method variants above which are good but not optimized for concurrent execution. In case you want to execute grouping collectors in a concurrent manner in a multi-threaded execution environment, then you can utilize the three overloaded methods in java.util.stream.Collectors
class all of whom are named groupingByConcurrent()
. These three concurrent methods have exactly the same signature as their non-concurrent counterparts - the same input parameters and the same return types respectively - their usage, apart from being used in concurrent contexts, is exactly the same as described above.
Conclusion