It made performance jump. It ran 2-3times faster. It kinda makes sense that when you limit the data since this collection is a big one it should run faster, but I didn’t find any documentation about it. Does anyone know why this is an important change and can explain to me why exactly is happening and provide some more information about it? Thanks in advance.
Yes , limiting the quantity of data usually speed up things.
But, in your case, the $project is the same, so the original amount of data (read from disk/cache) and the output amount of data is the same. It should not have a big influence.
What could make a big difference are:
How did you timed your operations? Are your numbers coming from the explain plan or wall clock including the total round trip of sending the query and processing the result? So it might be network delays if client and server are running on different machines. Resource contention if both client and server are running on the same machine.
Did you run your tests multiple times? If not, it could be that the documents and indexes were not in memory for the first and slow test and were already in cache for the second and fast test.
In principal, $match-ing unmodified (before $project) documents directly from a collection is faster because indexes can be used. You do have an index on ownerIds, don’t you? So, in principal, your 2nd faster test should be slower since you $project before you $match. I italicized should because I think the query optimizer detects that your $match uses fields from the original documents and performs the same.
Since your $match is using $in for $userId and $userId is a single value you could use the following syntax: