Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Regression in Count Query with Indexed Field #3191

Open
davidvandertuijn opened this issue Nov 5, 2024 · 6 comments
Open

Performance Regression in Count Query with Indexed Field #3191

davidvandertuijn opened this issue Nov 5, 2024 · 6 comments

Comments

@davidvandertuijn
Copy link

Description:

I am experiencing a significant performance regression when counting documents in the reefer_measurements collection using the reefer_id index.

Index created on reefer_id:

db.reefer_measurements.createIndex({ reefer_id: 1 });

When executing the following query to count documents:

$db = ReeferMeasurementModel::where('reefer_id', '=', $reefer->id);
$total = $db->count();

I observed the following performance times:

Version Time
Version 4.8.0 3,869.62ms
Version 5.0.0 22,550.24ms

Environment:

  • Laravel-mongodb Version: 5.0.0
  • PHP Version: 8.2
  • Laravel Version: Laravel: 11.29.0
  • MongoDB extension version => 1.20.0
  • Collection: reefer_measurements
  • Number of documents: 4 million

Steps to reproduce

  1. Create a collection reefer_measurements with 4 million documents;
  2. Create an index on reefer_id;
  3. Execute the count query as demonstrated above;
  4. Compare the performance between versions 4.8.0 and 5.0.0.

Expected behaviour

The count operation should be optimized and utilize the index effectively, resulting in a much lower execution time.

Actual behaviour

The execution time has increased dramatically from version 4.8.0 to 5.0.0, leading to degraded performance in applications relying on this count query.

I would appreciate any guidance on potential causes for this performance regression and suggestions for optimization.

@GromNaN
Copy link
Member

GromNaN commented Nov 5, 2024

Could you enable logs and report the query that is executed on both package versions?

DB::enableQueryLog();

// The query

dd(DB::getQueryLog());

@davidvandertuijn
Copy link
Author

@GromNaN

DB::connection('mongodb')->enableQueryLog();
$db = ReeferMeasurementModel::where('reefer_id', '=', $reefer->id);
$total = $db->count();
dd(DB::connection('mongodb')->getQueryLog());

Version 4.8.0

[
  "query" => "reefer_measurements.countDocuments({"reefer_id":303872},[])"
  "bindings" => []
  "time" => 1514.84
]

Version 5.0.0


[
    "query" => "{ "aggregate" : "measurements", "pipeline" : [ { "$match" : { "reefer_id" : { "$numberInt" : "303872" } } }, { "$group" : { "_id" : { "$numberInt" : "1" }, "n" : { "$sum" : { "$numberInt" : "1" } } } } ], "cursor" : {  } }"
    "bindings" => []
    "time" => 22677580
]

@GromNaN
Copy link
Member

GromNaN commented Nov 5, 2024

Thanks. Both commands are identical (in 4.8, you see the high level API, and in 5.0 you see the actual server command). So the performance difference is due to something in the code.

It there the same time difference when you disable query logs?

@davidvandertuijn
Copy link
Author

@GromNaN
I used minimal code with this profiler to see the time difference.
Using getQueryLog shows approximately the same result, but there’s still a significant difference between 4.8.0 and 5.0.0.

$start = microtime(true);
$db = ReeferMeasurementModel::where('reefer_id', '=', $reefer->id);
$total = $db->count();
$stop = microtime(true);
$time = ($stop - $start) * 1000;
echo number_format($time, 2).'ms';

@GromNaN
Copy link
Member

GromNaN commented Nov 5, 2024

I can't reproduce the time difference with a collection of the same size.

@davidvandertuijn
Copy link
Author

The MongoDB on the server is still running on version v4.2.25. We’ll be upgrading incrementally, and I’ll keep you updated on the results once we reach the latest release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants