$lookup (aggregation) and $project based on performance - Working with Data

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

相关文章推荐

聪明伶俐的馒头 · Displaying Multiple ...· 1 月前 ·

精明的领带 · 2023年河南南阳市新野县招聘教师149公告 ...· 6 月前 ·

宽容的凉茶 · 自带光驱的笔记本电脑推荐（体验便捷光盘读写的 ...· 6 月前 ·

瘦瘦的斑马 · 准妈妈如何获得更好的睡眠_健康_亲子中心_新浪网· 8 月前 ·

大方的茴香 · 热搜第一，“苏州机场”真的来了？官方最新回复 ...· 8 月前 ·

失恋的仙人掌 · Unity ...· 9 月前 ·

Hey all i was doing some performance tests on large aggregation mongo scripts and I noticed when I move the project line from

         "$lookup":{
        "from": "metrics",
        "as":   "metrics",
        "let": {"userId": "$_id"},
        "pipeline": {
        {"$match":{"$expr": {"$in": {"$userId","$ownerIds"}}}},
  ----> {"$project": {"fieldName": 1, "ownerIds": 1, "auth0Cache": 1}},
      "$lookup":{
        "from": "metrics",
        "as":   "metrics",
        "let": {"userId": "$_id"},
        "pipeline": {
---->   {"$project": {"fieldName": 1, "ownerIds": 1, "auth0Cache": 1}},
        {"$match":{"$expr": {"$in": {"$userId","$ownerIds"}}}},
It made performance jump. It ran 2-3times faster. It kinda makes sense that when you limit the data since this collection is a big one it should run faster, but I didn’t find any documentation about it. Does anyone know why this is an important change and can explain to me why exactly is happening and provide some more information about it? Thanks in advance.
              Yes , limiting the quantity of data usually speed up things.
But, in your case, the $project is the same, so the original amount of data (read from disk/cache) and the output amount of data is the same.   It should not have a big influence.
What could make a big difference are:
How did you timed your operations? Are your numbers coming from the explain plan or wall clock including the total round trip of sending the query and processing the result?  So it might be network delays if client and server are running on different machines.  Resource contention if both client and server are running on the same machine.
Did you run your tests multiple times?  If not, it could be that the documents and indexes were not in memory for the first and slow test and were already in cache for the second and fast test.
In principal, $match-ing unmodified (before $project) documents directly from a collection is faster because indexes can be used.  You do have an index on ownerIds, don’t you?  So, in principal, your 2nd faster test should be slower since you $project before you $match.  I italicized should because I think the query optimizer detects that your $match uses fields from the original documents and performs the same.
Since your $match is using $in for $userId and $userId is a single value you could use the following syntax:

      mongodb.com
              Hi again,

On point 1 : I timed it both with Golang performance tests and also how much time took the mongodb to respond on the script. We are seeing this results both on local setups where client and server and running on the same machine and also on deployed ones.

On point 2: Ive run it a lot of times and timings are the same.

Here is the explain:
	"stages" : [
			"$cursor" : {
				"query" : {
					"$and" : [
							"$and" : [
									"accountId" : {
										"$eq" : ObjectId("634d6e4e8b661fe80d614513")
									"$nor" : [
											"isDeleted" : {
												"$eq" : true
							"$nor" : [
									"isSystem" : {
										"$eq" : true
				"fields" : {
					"accountId" : 1,
					"auth0Cache.usermetadata.firstName" : 1,
					"auth0Cache.usermetadata.lastName" : 1,
					"dynamicMetricCount" : 1,
					"manualMetricCount" : 1,
					"metrics" : 1,
					"_id" : 1
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "test_employees_v2.users",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"$and" : [
								"accountId" : {
									"$eq" : ObjectId("634d6e4e8b661fe80d614513")
								"$nor" : [
										"isDeleted" : {
											"$eq" : true
								"$nor" : [
										"isSystem" : {
											"$eq" : true
					"winningPlan" : {
						"stage" : "FETCH",
						"filter" : {
							"$and" : [
									"$nor" : [
											"isDeleted" : {
												"$eq" : true
									"$nor" : [
											"isSystem" : {
												"$eq" : true
						"inputStage" : {
							"stage" : "IXSCAN",
							"keyPattern" : {
								"accountId" : 1,
								"subscriptionType" : 1
							"indexName" : "accountId_1_subscriptionType_1",
							"isMultiKey" : false,
							"multiKeyPaths" : {
								"accountId" : [ ],
								"subscriptionType" : [ ]
							"isUnique" : false,
							"isSparse" : false,
							"isPartial" : false,
							"indexVersion" : 2,
							"direction" : "forward",
							"indexBounds" : {
								"accountId" : [ "[ObjectId('634d6e4e8b661fe80d614513'), ObjectId('634d6e4e8b661fe80d614513')]" ],
								"subscriptionType" : [ "[MinKey, MaxKey]" ]
					"rejectedPlans" : [ ]
			"$lookup" : {
				"from" : "metrics",
				"as" : "metrics",
				"let" : {
					"userId" : "$_id"
				"pipeline" : [
						"$project" : {
							"fieldName" : 1,
							"accountId" : 1,
							"isDeleted" : 1,
							"ownerIds" : 1,
							"auth0Cache" : 1,
							"goalId" : 1
						"$match" : {
							"$and" : [
									"accountId" : ObjectId("634d6e4e8b661fe80d614513")
									"isSystem" : {
										"$ne" : true
						"$match" : {
							"$expr" : {
								"$in" : [ "$$userId", "$ownerIds" ]
			"$addFields" : {
				"manualMetricCount" : {
					"$size" : [
							"$filter" : {
								"input" : "$metrics",
								"as" : "metrics",
								"cond" : {
									"$eq" : [
										"$$metrics.fieldName",
											"$const" : ""
				"dynamicMetricCount" : {
					"$size" : [
							"$filter" : {
								"input" : "$metrics",
								"as" : "metrics",
								"cond" : {
									"$ne" : [
										"$$metrics.fieldName",
											"$const" : ""
			"$match" : {
				"dynamicMetricCount" : {
					"$gt" : 1
			"$facet" : {
				"items" : [
						"$sort" : {
							"sortKey" : {
								"auth0Cache.usermetadata.firstName" : 1,
								"_id" : 1
							"limit" : 100
						"$project" : {
							"_id" : true,
							"dynamicMetricCount" : true,
							"accountId" : true,
							"manualMetricCount" : true,
							"auth0Cache" : {
								"usermetadata" : {
									"lastName" : true
				"totalCount" : [
						"$group" : {
							"_id" : {
								"$const" : null
							"totalCount" : {
								"$sum" : {
									"$const" : 1
						"$project" : {
							"_id" : false,
							"totalCount" : true
			"$project" : {
				"_id" : true,
				"items" : true,
				"totalCount" : {
					"$arrayElemAt" : [
						"$totalCount",
							"$const" : 0
			"$project" : {
				"_id" : true,
				"items" : true,
				"totalCount" : "$totalCount.totalCount"
	"ok" : 1,
	"operationTime" : Timestamp(1666600486, 1),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1666600486, 1),
		"signature" : {
			"hash" : BinData(0,"RTt0l2k9ECaNmPQz+Tt9eQHxIEA="),
			"keyId" : NumberLong("7132835261448192001")
For the lookup it seems to be no data for the explain, maybe i should provide some parameter to the explain.  Here the project is the first thing in the lookup pipeline. Before adding it there there are no project and this script ran 2-3 times slower than now.
              I see you do some $project before $match.  It is always better to $match before $project.  The query optimizer might not be able to use the indexes.
Can you test for isDeleted:false and isSystem:false rather than $nor:isDeleted:$eq:true?
You did not share you whole pipeline so it is hard to make sense of the explain plan.