Npgsql.NpgsqlException (0x80004005): Exception while reading from stream
---> System.TimeoutException: Timeout during reading attempt
at Npgsql.Internal.NpgsqlReadBuffer.<Ensure>g__EnsureLong|54_0(NpgsqlReadBuffer buffer, Int32 count, Boolean async, Boolean readingNotifications)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlDataReader.NextResult()
at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior)
at Npgsql.NpgsqlBatch.ExecuteReader(CommandBehavior behavior)
at Npgsql.Internal.NpgsqlConnector.QueryDatabaseState(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlMultiHostDataSource.TryGetIdleOrNew(NpgsqlConnection conn, TimeSpan timeoutPerHost, Boolean async, TargetSessionAttributes preferredType, Func`3 stateValidator, Int32 poolIndex, IList`1 exceptions, CancellationToken cancellationToken)
The Stacktrace looks pretty simiular to this issue so i decided to post this here. Some information about our system:
We were using:
.net8
with docker (mcr.microsoft.com/dotnet/aspnet:8.0.1-bookworm-slim@sha256:3ff67792728179308c4bf06799d8b45d155c60fddc347acf69465a496d9a20b8)
npgsql with 8.0.1
Postgres 13
We use the "Multiple Hosts"-Feature with 3 Postgres nodes where one of them is primary and the other hosts are replicas. So we set Target Session Attributes=primary.
System.InvalidOperationException: The reader is closed - 8.0.0
Npgsql.NpgsqlException (0x80004005): Exception while reading from stream - 8.0.0
Jan 19, 2024
I want to make sure we can put this issue to bed properly with 8.0.2.
@kae would you mind running the latest ci build from myget to see if things are working correctly for you? (https://www.myget.org/feed/npgsql-vnext/package/nuget/Npgsql/9.0.0-preview.1-ci.20240131T215454+sha.882265627)
Anybody else is free to try as well of course.
Hello,
To test this bug we have added integration test. The error is consistently reproduced under fairly high load (hundreds of parallel requests or more).
We have now run this test for build with the specified package version (9.0.0-preview.1-ci.20240131T215454).
Unfortunately, the error is being reproduced.
Message: "Timeout during reading attempt".
at Npgsql.Internal.NpgsqlReadBuffer.<<Ensure>g__EnsureLong|54_0>d.MoveNext()
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
at Npgsql.Internal.NpgsqlConnector.<ReadMessageLong>d__233.MoveNext()
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at Npgsql.Internal.NpgsqlConnector.<ReadMessageLong>d__233.MoveNext()
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at Npgsql.NpgsqlDataReader.<NextResult>d__52.MoveNext()
at Npgsql.NpgsqlDataReader.<NextResult>d__52.MoveNext()
at Npgsql.NpgsqlDataReader.NextResult()
at Npgsql.NpgsqlCommand.<ExecuteReader>d__119.MoveNext()
at Npgsql.NpgsqlCommand.<ExecuteReader>d__119.MoveNext()
at System.Threading.Tasks.ValueTask`1.get_Result()
at Npgsql.NpgsqlCommand.<ExecuteNonQuery>d__107.MoveNext()
at Npgsql.NpgsqlCommand.ExecuteNonQuery()
at Npgsql.NpgsqlBatch.ExecuteNonQuery()
at Red.WorkPoint.Data.TmConnectionInterceptor.ConnectionOpened(DbConnection connection, ConnectionEndEventData eventData) in D:\repos\backend\Red.WorkPoint.Data\Data\TmConnectionInterceptor.cs:line 22
at Microsoft.EntityFrameworkCore.Diagnostics.Internal.RelationalConnectionDiagnosticsLogger.ConnectionOpened(IRelationalConnection connection, DateTimeOffset startTime, TimeSpan duration)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternal(Boolean errorsExpected)
at Microsoft.EntityFrameworkCore.Storage.RelationalConnection.Open(Boolean errorsExpected)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReader(RelationalCommandParameterObject parameterObject)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.InitializeReader(Enumerator enumerator)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.<>c.<MoveNext>b__21_0(DbContext _, Enumerator enumerator)
at Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded)
@zolotovvv that is not the same issue, please create a new issue with its own runnable repro.
EF does not use CommandBehavior.Sequential, which Dapper does and where our fixes have been targeted.
@kae would you mind running the latest ci build from myget to see if things are working correctly for you? (https://www.myget.org/feed/npgsql-vnext/package/nuget/Npgsql/9.0.0-preview.1-ci.20240131T215454+sha.882265627)
Checked with repro
there were no issues with "9.0.0-preview.1-ci.20240131T215454" and stable reproduction with "8.0.1"
Hello, was this issue really solved for version 8.0.3. I was running performance test for 100 concurrent users. I used JMeter collection with 40 various steps/API calls. I had no issue when using Npqsql version 7.0.7 (of course using EntityFramework version 7). But when I used Npqsql version 8.0.3 (with EntityFramework version 8) I got a low of exceptions Npgsql.NpgsqlException (0x80004005): Exception while reading from stream --> System.TimeoutException: Timeout during reading attempt at Npgsql.Internal.NpgsqlReadBuffer. And therefore I am not able to complete performance test for 100 concurrent users with version 8. And as I wrote above version 7 works without any issue. And of course connection string and timeout settings was the same for both testing scenarios.
Hi. I verified just one API step which was the most problematic. Generated queries was relative simple and exactly same for v7 and v8. The problem was especially with Insert operation into multiple tables (one API call generates insert to multiple tables). But because our system is robust the "Npgsql.NpgsqlException (0x80004005): Exception while reading from stream" started to appear on various places/methods. But it was usually one of the first 5 methods from prepared collection. But there was no chance to complete other steps because I tested real application journey and each step depends on result of previous step.
Hello, I don't want to tell that our code/structure is optimal but I don't think pg_stat_statements and pg_stat_activity will help me. The goal of our application is to support 100 concurrent users. But after we switch to version 8 we are not able to reach this goal. We are able to sucessfully run test(s) for 50 concurrent users. As I wrote above we test complex insurance journey with more then 40 steps (API calls). We store data into Platoform and Document databases. And when we test 100 concurrent users we get a lot of Npgsql.NpgsqlException (0x80004005): Exception while reading from stream errors.
I made also tests using infinity timeout (Command Timeout = 0) and than error above occured occasionally. But then we started to have another error: Npgsql.PostgresException (0x80004005): 53300: sorry, too many clients already
So test result is exactly same. We are not able to support 100 concurrent users. It works for 50 concurrent users only.
Another test I've made was that I changed/downgraded following nugets (just a note that I haven't updated framework version):
Npgsql version: 7.0.7
Npgsql.EntityFrameworkCore.PostgreSQL version: 7.0.18
Microsoft.EntityFrameworkCore version 7.0.20
And in this case I don't get any error when running test(s) for 100 concurrent users! And code and tests are exactly same (except nugets mentioned above). So my experience is that performance is radically reduced when switching from Npgsql+EF Core version 7 to Npgsql+EF Core version 8. And all exceptions we get come from Npgsql nuget. And all I want is that update from version 7 to version 8 will not decrease performance of our application.
I also compared sql query/script generated for the most problematic operation in our test which was storing of documents. Scrips are exactly the same for both version 7 and version 8. I attached generated script to store one document generated with dynamic tags. As you can see insert queries are very simple.
Of course we also get errors in other actions too but storing of document is probably the most time-consume action. Just FYI, document(s) can be generated in 2 different steps of our application journey test.
Further technical details
Framework: .Net 8
Npgsql version: 8.0.3
Npgsql.EntityFrameworkCore.PostgreSQL version: 8.0.4
Microsoft.EntityFrameworkCore version 8.0.6
PostgreSQL version: 14.5
Operating system: Windows
InsertDocument.zip
Thanks for providing more context, though these problems will always be very difficult to diagnose from a distance. The library is under performance regression testing by multiple teams (the ASP.NET team and TechEmpower for instance) so it's unlikely we have actually regressed general performance.
However there may be some lingering edge-case issue from our 8.x rewrite, and this could be what you are potentially hitting.
InsertDocument.zip
Could you provide the schema for these inserts? Essentially I want to see which database types are used, to quickly check whether there could be any suboptimal write implementations for these types.
Hello, I don't want to tell that our code/structure is optimal but I don't think pg_stat_statements and pg_stat_activity will help me.
I'm asking you to take a look at pg_stat_statements
and pg_stat_activity
just to confirm that the query in question is indeed fast so the issue is on our side and not on pg's.
Looking at the query you provided there could be multiple other reasons why it could be slow:
If there are PK/FK, PG has to validate them on insert, which can result in full table scan if there is no index on referenced tables. And it becomes slower the more rows there are in the referenced table.
By default postgres writes query changes to WAL immediately after a COMMIT
is requested and blocks until the write is successful. That means that the more inserts into the same table there are, the slower they'll be because they're going to block each other (it depends whether they're writing into the same page and a few other things, but more or less that's how it is).
Every time you insert a row pg has to update indices. In some cases pg has to rebalance them which will result in exclusive lock.
Hello, I don't want to tell that our code/structure is optimal but I don't think pg_stat_statements and pg_stat_activity will help me.
I'm asking you to take a look at pg_stat_statements
and pg_stat_activity
just to confirm that the query in question is indeed fast so the issue is on our side and not on pg's. Looking at the query you provided there could be multiple other reasons why it could be slow:
If there are PK/FK, PG has to validate them on insert, which can result in full table scan if there is no index on referenced tables. And it becomes slower the more rows there are in the referenced table.
By default postgres writes query changes to WAL immediately after a COMMIT
is requested and blocks until the write is successful. That means that the more inserts into the same table there are, the slower they'll be because they're going to block each other (it depends whether they're writing into the same page and a few other things, but more or less that's how it is).
Every time you insert a row pg has to update indices. In some cases pg has to rebalance them which will result in exclusive lock.
Hello, I understand what do your wrote and it could really help in some cases. But my test generates so many/various queries. And as I wrote that the same queries works on version 7 but does not work (or works in limited way) on version 8. Therefore I am not sure if pg_stat_statements and pg_stat_activity could help to solve the current issue.
I made also tests using infinity timeout (Command Timeout = 0) and than error above occured occasionally. But then we started to have another error: Npgsql.PostgresException (0x80004005): 53300: sorry, too many clients already
So test result is exactly same
Btw this means that across your app you're not limiting your maximum connections properly. Maybe you're opening some management connection somewhere (e.g. health checks, background worker etc) while your main pool max size hasn't been reduced to take this into account.
Hello Nino, thanks for your answer. I attached schema for related tables.
Schema.zip
I took a look and it seems the schema is using common types like: bool, uuid, integer, text, citext, timestamp with timezone, and bytea.
Many other users would have opened issues if these types would have regressed significantly in v8.
I think we're going to need a profile of the app under load to understand where a bottleneck may have appeared since v8.
Could you collect a dotnet-trace? It won't contain a memory dump but you can send it to my github mail address if you don't want this to be public.
@risulo One more thought, did you take a look at cpu stats and such to see whether v8 is using at least as much cpu as v7?
Nino, I retested it today. CPU usage was up to 65% and memory usage was between 80% and 85% on my testing machine. These values was approximately the same for both v7 and v8. And they was also approximately same when I tested 10, 50 or 100 concurrent users.
OK one more question, are you using sync ADO.NET apis by any chance?
If the impacted features in your app are properly doing async EF calls I'll really have to see a profile to help you any further.
Our solution is relative old and does not use async EF calls.
@risulo take a look at this issue #5575 (comment)
If at all possible I suggest you to try and increase your minimum threadpool threads (e.g. ThreadPool.SetMinThreads(100, 100)) to see if it brings the perf back to 7.x levels. Alternatively you can refactor your code to use async IO or decide to stay on 7.x for the coming years.
We're aware it's not ideal from a backwards compatibility standpoint. Npgsql is an evolving driver and we have to re-evaluate trade-offs and workarounds over time to stay modern and maintainable. In 8.x we decided to remove one of our workarounds to handle synchronous batch queries at higher loads which had deep impact on all the async code we maintain. Keeping it any longer would have been a burden on new work we wanted to deliver; like the support for NativeAOT we landed in 8.0.0. Removing this workaround also solved some long standing issues like enabling Microsoft Orleans compatibility where before it was preventing users from using Npgsql asynchronously together with it at all (see #1130).
Over time balances shift to favor one trade-off over another based on maintainability costs, popularity and impact. Async is really here to stay and is our recommended execution mode for anything beyond simple single statement queries.
@risulo take a look at this issue #5575 (comment)
If at all possible I suggest you to try and increase your minimum threadpool threads (e.g. ThreadPool.SetMinThreads(100, 100)) to see if it brings the perf back to 7.x levels. Alternatively you can refactor your code to use async IO or decide to stay on 7.x for the coming years.
We're aware it's not ideal from a backwards compatibility standpoint. Npgsql is an evolving driver and we have to re-evaluate trade-offs and workarounds over time to stay modern and maintainable. In 8.x we decided to remove one of our workarounds to handle synchronous batch queries at higher loads which had deep impact on all the async code we maintain. Keeping it any longer would have been a burden on new work we wanted to deliver; like the support for NativeAOT we landed in 8.0.0. Removing this workaround also solved some long standing issues like enabling Microsoft Orleans compatibility where before it was preventing users from using Npgsql asynchronously together with it at all (see #1130).
Over time balances shift to favor one trade-off over another based on maintainability costs, popularity and impact. Async is really here to stay and is our recommended execution mode for anything beyond simple single statement queries.
Hello Nino. Thanks for your reply. At first "sync" solution using ThreadPool.SetMinThreads(100, 100) setting does not work. It is true that I don't get Timeout exception but I get another one: Npgsql.PostgresException (0x80004005): 53300: sorry, too many clients already
But I tested "async" solution and it works fine. Of course I haven't updated whole solution yet. I just tested (and updated) 5 steps and had executed test with 100 concurent users. And it works fine for async solution. The same test fails for 100 concurent users using "sync" solution.
It is true that I don't get Timeout exception
That's great to hear, ThreadPool.SetMinThreads(100, 100)
is precisely to prevent those timeout exceptions from happening.
but I get another one: Npgsql.PostgresException (0x80004005): 53300: sorry, too many clients already
This is always due to something in your app (or wider infra) holding onto more connections than your database will accept. Make sure you don't use multiple datasources, or have other places that may hold open a connection to the database. Also check whether your connection string MaxPoolSize (defaults to 100 connections) fits within what your db accepts. Many cloud databases have maximum connection counts that scales with instance size, so it may well accept less than 100 connections on small instances used for testing.
But I tested "async" solution and it works fine. Of course I haven't updated whole solution yet. I just tested (and updated) 5 steps and had executed test with 100 concurent users.
If you can, please do, async is very much recommended at this point.