Spring Mongo aggragation average per hour

3 min read 05-10-2024

Spring Mongo aggragation average per hour

Calculating Average Values per Hour with Spring Data MongoDB: A Practical Guide

Analyzing data on an hourly basis often provides valuable insights. If you're working with a large dataset in MongoDB and need to calculate average values for each hour, Spring Data MongoDB's aggregation framework offers a powerful solution.

Let's explore how to achieve this with a practical example.

Scenario: Tracking Website Traffic

Imagine you have a website with a MongoDB collection storing user visit data. Each document represents a single visit and contains fields like timestamp (representing the visit time), duration (the visit duration in seconds), and pagesViewed (the number of pages viewed).

You want to generate a report that shows the average duration and pagesViewed for each hour across a specific date range.

Original Code (Illustrative)

Here's a basic example of how you might attempt this using Spring Data MongoDB:

import org.springframework.data.mongodb.core.aggregation.Aggregation;
import org.springframework.data.mongodb.core.aggregation.GroupOperation;
import org.springframework.data.mongodb.core.aggregation.MatchOperation;
import org.springframework.data.mongodb.core.aggregation.ProjectionOperation;
import org.springframework.data.mongodb.core.aggregation.SortOperation;
import org.springframework.data.mongodb.core.aggregation.AggregationResults;
import org.springframework.data.mongodb.core.aggregation.MatchOperation;
import org.springframework.data.mongodb.core.query.Criteria;
import org.springframework.data.mongodb.core.MongoTemplate;

import java.time.LocalDate;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.time.temporal.ChronoUnit;
import java.util.Date;

public class HourlyTrafficReport {

    private final MongoTemplate mongoTemplate;

    public HourlyTrafficReport(MongoTemplate mongoTemplate) {
        this.mongoTemplate = mongoTemplate;
    }

    public List<HourlyTrafficData> generateReport(LocalDate startDate, LocalDate endDate) {
        // Convert LocalDate to Date for MongoDB
        Date start = Date.from(startDate.atStartOfDay(ZoneId.systemDefault()).toInstant());
        Date end = Date.from(endDate.atStartOfDay(ZoneId.systemDefault()).toInstant());

        // Define the aggregation pipeline
        Aggregation aggregation = Aggregation.newAggregation(
                MatchOperation.create(Criteria.where("timestamp").gte(start).lte(end)), // Filter by date range
                ProjectionOperation.project("timestamp", "duration", "pagesViewed")
                        .andExpression("hour").expression("HOUR($timestamp)").as("hour"), // Extract hour from timestamp
                GroupOperation.create("hour") // Group by hour
                        .average("duration").as("avgDuration")
                        .average("pagesViewed").as("avgPagesViewed")
        );

        // Execute the aggregation and return the results
        AggregationResults<HourlyTrafficData> results = mongoTemplate.aggregate(aggregation, "visits", HourlyTrafficData.class);
        return results.getMappedResults();
    }

    // POJO to store the hourly traffic data
    public static class HourlyTrafficData {
        private int hour;
        private double avgDuration;
        private double avgPagesViewed;

        // Getters and setters
    }
}

This code utilizes Spring Data MongoDB's aggregation framework to perform the following steps:

Filter: Selects documents within the specified date range.
Project: Extracts the relevant fields (timestamp, duration, pagesViewed) and projects an additional field called hour representing the hour of the day.
Group: Groups documents based on the extracted hour field and calculates the average duration and pagesViewed within each group.

Key Insights and Considerations

Date/Time Handling: Notice how LocalDate is converted to Date for use with MongoDB. This ensures proper data type compatibility.
Aggregation Pipeline: The Aggregation object defines the sequence of operations to perform on the data. Each operation builds upon the previous one.
Projection: The ProjectionOperation allows you to select specific fields and add calculated fields. Here, we use the HOUR($timestamp) expression to extract the hour from the timestamp.
Grouping: The GroupOperation aggregates documents based on a specific field. We group by the calculated hour field and compute the average values of duration and pagesViewed.
Output: The AggregationResults object contains the aggregated data, which we map to a HourlyTrafficData POJO for easy access and use.

Practical Enhancements

Timestamp Precision: Depending on your needs, you might want to group by minutes or even seconds using MINUTE($timestamp) or SECOND($timestamp) expressions.
Sorting: The aggregation can be extended to sort the results by hour or other criteria using the SortOperation.
Customizations: The aggregation framework allows for flexible grouping and data analysis. You can customize it to calculate other relevant statistics like minimum, maximum, or total values.

Conclusion

This article provided a practical guide to calculating average values per hour using Spring Data MongoDB's aggregation framework. With a deeper understanding of its capabilities, you can easily customize and extend this approach to meet your specific analysis requirements.

Remember to tailor the code to your specific data structure and analysis needs. This approach unlocks powerful data analysis capabilities within your Spring Boot applications!