Periodic InvalidCastException and “The server fail

2020-02-12 04:23发布

问题:

I see this on our stage system, after it has been up for 2-3 days.

"The server failed to resume the transaction. Desc:39000000ef." (with desc:xxx increasing every time).

The stack trace shows

System.Data.SqlClient.SqlException: The server failed to resume the transaction. Desc:39000000ef.
at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)
at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()
at System.Data.SqlClient.SqlDataReader.get_MetaData()
at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)
at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)
at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
at System.Data.Common.DbCommand.ExecuteReader()
at System.Data.Linq.SqlClient.SqlProvider.Execute(Expression query, QueryInfo queryInfo, IObjectReaderFactory factory, Object[] parentArgs, Object[] userArgs, ICompiledSubQuery[] subQueries, Object lastResult)
at System.Data.Linq.SqlClient.SqlProvider.ExecuteAll(Expression query, QueryInfo[] queryInfos, IObjectReaderFactory factory, Object[] userArguments, ICompiledSubQuery[] subQueries)
at System.Data.Linq.SqlClient.SqlProvider.System.Data.Linq.Provider.IProvider.Execute(Expression query)
at System.Data.Linq.DataQuery`1.System.Collections.Generic.IEnumerable<T>.GetEnumerator()
at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
at ...

After this has happened the entire server goes to heck. It seems like the metadata is no longer correctly represented in memory, because I see many InvalidCastExceptions afterwards. I'm guessing this is due to the system trying to deserialize a string into an int field because the metadata is offset incorrectly.

i.e.

System.InvalidCastException: Specified cast is not valid.
at System.Data.SqlClient.SqlBuffer.get_Int32()
at System.Data.SqlClient.SqlDataReader.GetInt32(Int32 i)
at Read_Order(ObjectMaterializer`1 )
at System.Data.Linq.SqlClient.ObjectReaderCompiler.ObjectReader`2.MoveNext()
at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)

What does this mean?

I'm fairly sure the database wasn't updated behind the systems back, and the database was online the entire time.

The problem persists from here on out, until the server is restarted, after which everything works nicely.

Do I need to have code that re-establishes database connection if it fails? Doesn't the framework handle this by itself?

I'm using Sql server 2008, IIS 6, and .Net 3.5SP1

UPDATE: The code structure does something like this:

var dc = new datacontext()
IList<someobject> objs = dc.GetAllItemsToProcess()
var dc2 = new datacontext();
 foreach( var o in objs ) {
    try {
         var o2 = dc2.someobjects.SingleOrDefault(x=>x.id = o.id);
          // do stuff to o2
         dc2.save();
   } catch() {
          // something failed so restart datacontext()
         dc2 = new datacontext();
    }
}

回答1:

This is likely not a problem with your code. It is a bug in SQL Server. They had a similar problem in SQL Server 2005. It only happened under conditions that were just right, so very few people ever saw it and those that did were very confused.

Having said that, here are some things to check that have worked for others with the same problem:

  • Look for DataReaders that aren't closed. Make sure you're doing myReader.Close() after you read the rows you want. Many people just rock on without closing.
  • Use the native SqlTransaction class instead of OleDbTransactions wherever possible.
  • Look at your transactions. Make sure you're committing/rolling back cleanly before you close your Connection.
  • Use Connection.BeginTransation rather than Connection.BeginDbTransaction


回答2:

  • Don't share the same DataContext among different operations. Use the same DataContext to group the set of operations that you will be submitting. If you have another set that you will be submitting separately use a separate DataContext.
  • I am not sure, but also check if you are enumerating over 2 IQueryables at the same time. If that's the case, try using a separate DataContext for the second enumeration to make sure it isn't caused by linq2sql trying to open to readers at the same time.
  • Make sure your model is up to date with your DB. The structure define in the Linq2sql designer isn't updated automatically, so when you add a field you need to make sure that's added to the designer (perhaps by re-adding the table if you haven't renamed any fields).


回答3:

Just an update:

  • I tracked the bug to our use of Linq2Sql-entities passed as datasources directly to Report Server.

I.e. we were doing something like this:

List<Order> orderList = orderRepository.getOrders();
ReportViewer rv = new ReportViewer();
rv.LocalReport.DataSources.Add("OrderList", orderList);
rv.Render();

For some reason this made the datacontext confused, even when running in the same app-domain.



回答4:

Do you use DataContext object for atomic units of work ? this will help alot with the connection issues if exist.

also sometimes you need to check for a physical damage in RAM for example.