Service error in memcache with Java Google App Eng

2020-07-22 19:28发布

问题:

In the past month, our Java Google App Engine Standard Web App started getting strange errors at seemingly random times (see stack trace below). Around this time we made the following changes:

  1. Switch from Java7 runtime to Java8/Jetty9 runtime (which allowed us more flexibility in linking to a 3rd party payments library).

  2. Switch to deploying with the Google Cloud SDK, instead of the separate Google App Engine SDK.

Yesterday we experienced 3 periods with errors. One of these occurred from 0530PST to about 0600PST on Feb 28. Suddenly all attempts to load from the database started failing:

com.google.appengine.api.memcache.LogAndContinueErrorHandler handleServiceError: Service error in memcache (LogAndContinueErrorHandler.java:50)
com.google.appengine.api.memcache.MemcacheServiceException: Memcache getIdentifiables: exception getting multiple keys
    at com.google.appengine.api.memcache.MemcacheServiceApiHelper$RpcResponseHandler.handleApiProxyException(MemcacheServiceApiHelper.java:77)
    at com.google.appengine.api.memcache.MemcacheServiceApiHelper$1.absorbParentException(MemcacheServiceApiHelper.java:122)
    at com.google.appengine.api.utils.FutureWrapper.handleParentException(FutureWrapper.java:56)
    at com.google.appengine.api.utils.FutureWrapper.get(FutureWrapper.java:95)
    at com.google.appengine.api.memcache.MemcacheServiceImpl.quietGet(MemcacheServiceImpl.java:28)
    at com.google.appengine.api.memcache.MemcacheServiceImpl.getIdentifiables(MemcacheServiceImpl.java:61)
    at com.googlecode.objectify.cache.EntityMemcache.getAll(EntityMemcache.java:215)
    at com.googlecode.objectify.cache.CachingAsyncDatastoreService.get(CachingAsyncDatastoreService.java:253)
    at com.googlecode.objectify.impl.engine.LoadEngine$Round.fetchPending(LoadEngine.java:172)
    at com.googlecode.objectify.impl.engine.LoadEngine$Round.execute(LoadEngine.java:118)
    at com.googlecode.objectify.impl.engine.LoadEngine.execute(LoadEngine.java:258)
    at com.googlecode.objectify.impl.cmd.LoaderImpl.refs(LoaderImpl.java:110)
    at com.googlecode.objectify.impl.cmd.LoaderImpl.refs(LoaderImpl.java:98)
    at com.googlecode.objectify.impl.cmd.LoaderImpl.ref(LoaderImpl.java:88)
    at com.googlecode.objectify.impl.cmd.LoadTypeImpl.refOf(LoadTypeImpl.java:92)
    at com.googlecode.objectify.impl.cmd.LoadTypeImpl.id(LoadTypeImpl.java:86)
    at com.codeavengers.LoginServlet.doPost(LoginServlet.java:62)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
    at com.codeavengers.UrlFilter.doFilter(UrlFilter.java:143)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.codeavengers.payments.PermissionsFilter.doFilter(PermissionsFilter.java:114)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.googlecode.objectify.cache.AsyncCacheFilter.doFilter(AsyncCacheFilter.java:59)
    at com.googlecode.objectify.ObjectifyFilter.doFilter(ObjectifyFilter.java:49)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.google.apphosting.utils.servlet.ParseBlobUploadFilter.doFilter(ParseBlobUploadFilter.java:125)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.google.apphosting.runtime.jetty9.SaveSessionFilter.doFilter(SaveSessionFilter.java:43)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.google.apphosting.utils.servlet.JdbcMySqlConnectionCleanupFilter.doFilter(JdbcMySqlConnectionCleanupFilter.java:60)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at com.google.apphosting.utils.servlet.TransactionCleanupFilter.doFilter(TransactionCleanupFilter.java:48)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at com.google.apphosting.runtime.jetty9.AppVersionHandlerMap.handle(AppVersionHandlerMap.java:297)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at org.eclipse.jetty.server.Server.handle(Server.java:534)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
    at com.google.apphosting.runtime.jetty9.RpcConnection.handle(RpcConnection.java:202)
    at com.google.apphosting.runtime.jetty9.RpcConnector.serviceRequest(RpcConnector.java:81)
    at com.google.apphosting.runtime.jetty9.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:108)
    at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.dispatchServletRequest(JavaRuntime.java:680)
    at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.dispatchRequest(JavaRuntime.java:642)
    at com.google.apphosting.runtime.JavaRuntime$RequestRunnable.run(JavaRuntime.java:612)
    at com.google.apphosting.runtime.JavaRuntime$NullSandboxRequestRunnable.run(JavaRuntime.java:806)
    at com.google.apphosting.runtime.ThreadGroupPool$PoolEntry.run(ThreadGroupPool.java:274)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.apphosting.api.ApiProxy$UnknownException: An error occurred for the API request memcache.Get().
    at com.google.apphosting.utils.runtime.ApiProxyUtils.statusException(ApiProxyUtils.java:207)
    at com.google.apphosting.utils.runtime.ApiProxyUtils.getRpcError(ApiProxyUtils.java:128)
    at com.google.apphosting.runtime.ApiProxyImpl$AsyncApiFuture.setRpcError(ApiProxyImpl.java:601)
    at com.google.apphosting.runtime.ApiProxyImpl$AsyncApiFuture.failure(ApiProxyImpl.java:589)
    at com.google.apphosting.runtime.http.HttpApiHostClient.communicationFailure(HttpApiHostClient.java:170)
    at com.google.apphosting.runtime.http.JettyHttpApiHostClient$Listener.onComplete(JettyHttpApiHostClient.java:141)
    at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:193)
    at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:185)
    at org.eclipse.jetty.client.HttpExchange.notifyFailureComplete(HttpExchange.java:269)
    at org.eclipse.jetty.client.HttpExchange.abort(HttpExchange.java:240)
    at org.eclipse.jetty.client.HttpConversation.abort(HttpConversation.java:141)
    at org.eclipse.jetty.client.HttpRequest.abort(HttpRequest.java:735)
    at org.eclipse.jetty.client.HttpDestination.abort(HttpDestination.java:267)
    at org.eclipse.jetty.client.PoolingHttpDestination.failed(PoolingHttpDestination.java:90)
    at org.eclipse.jetty.client.DuplexConnectionPool$1.failed(DuplexConnectionPool.java:159)
    at org.eclipse.jetty.util.Promise$Wrapper.failed(Promise.java:84)
    at org.eclipse.jetty.client.HttpClient$1$1.failed(HttpClient.java:587)
    at org.eclipse.jetty.client.AbstractHttpClientTransport.connectFailed(AbstractHttpClientTransport.java:152)
    at org.eclipse.jetty.client.AbstractHttpClientTransport.connect(AbstractHttpClientTransport.java:141)
    at org.eclipse.jetty.client.HttpClient$1.connect(HttpClient.java:592)
    at org.eclipse.jetty.client.HttpClient$1.succeeded(HttpClient.java:569)
    at org.eclipse.jetty.client.HttpClient$1.succeeded(HttpClient.java:562)
    at org.eclipse.jetty.util.SocketAddressResolver$Async.lambda$resolve$1(SocketAddressResolver.java:181)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
    ... 1 more

From the investigation I have done so far, I believe it may have something to do with the version of Objectify we were using. We were on objectify-4.0b1.jar. I tried upgrading to objectify 4.1, and that version of the site is working at the same time that the version with 4.0 is not working. So perhaps that has something to do with it?

I am hesitant to switch the version that uses objectify 4.1 over to production because once you do that, we can't switch back to an older stable version of the site. So I want to make sure that the upgrade will actually solve our problems.

Update 1

After 22 hours of the app functioning normally, the app started failing again with more Memcache errors. We tried clearing Memcache but that didn't do anything. 24 hours later, that version of the app is still not working.

We overcame the problem by redeploying a new version with the exact same code and it started working fine again. 24 hours later that version is still functioning fine.

Update 2

We switched to a version that uses Objectify 4.1.3 and so far after 24 hours that version is doing fine.

Extra Information on Related Issues

We had other strange issues over the past month since the switch to Java8+CloudSDK.

For example, at random times (mostly on Friday afternoon PST) the site would stop working correctly on some computers. Then after 30 minutes or so it would work again. Then a week later the same problem happened.

After carefully reading the documentation for Google Cloud SDK, it said they changed the default staging option of Cloud SDK for the delete-jsps was true, whereas with App Engine SDK it was false. Our server was reading data from a couple of JSP files... so this change broke our site.

The thing was, it still worked fine on local server, and on the live server it also worked fine 99% of the time. It was as if they were AB testing the effect of changing that default parameter on some percentage of instances at random times. Which seems crazy, since they could have checked our data logs and seen that our site was loading and reading values from jsp files.

Once we removed that data from the JSP files and put it in separate config files, that problem went away.

It seems that once again Google is experimenting with some changes to Memcache at random times, and that is completely breaking our site when using Objectify 4.0b2? Is that feasible?

回答1:

IIRC there were some changes in the way Objectify uses memcache between 4.0b2 and 4.1. While it seems unlikely, it's possible that the usage pattern may tickle some sort of underlying issue in memcache.

4.0b2 is really old - more than 4 years - and there have been a large number of issues fixed since then. It would be wise to migrate up to the current version.