active_model_serializers: Caching doesn't improve performance

Expected behavior vs actual behavior

Expected: Configure a cache and using the AMS serializer cache method should improve rendering performance.

Actual: performance decreases AND more objects are allocated.

Steps to reproduce

current master: git co fa0bc95.

 bin/bench
caching on: caching serializers: gc off 606.0970710386515/ips; 1853 objects
caching off: caching serializers: gc off 526.5338285238549/ips; 1853 objects
caching on: non-caching serializers: gc off 709.8031139840541/ips; 1390 objects
caching off: non-caching serializers: gc off 746.4513428127035/ips; 1390 objects
Benchmark results:
{
  "commit_hash": "fa0bc95",
  "version": "0.10.0.rc4",
  "benchmark_run[environment]": "2.2.3p173",
  "runs": [
    {
      "benchmark_type[category]": "caching on: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 606.097,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1853
    },
    {
      "benchmark_type[category]": "caching off: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 526.534,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1853
    },
    {
      "benchmark_type[category]": "caching on: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 709.803,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1390
    },
    {
      "benchmark_type[category]": "caching off: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 746.451,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1390
    }
  ]
}

CACHE_ON=false bin/bench
caching on: caching serializers: gc off 664.8712562099971/ips; 1853 objects
caching off: caching serializers: gc off 613.6203762167032/ips; 1853 objects
caching on: non-caching serializers: gc off 752.267454951568/ips; 1390 objects
caching off: non-caching serializers: gc off 692.4981276214933/ips; 1390 objects
Benchmark results:
{
  "commit_hash": "fa0bc95",
  "version": "0.10.0.rc4",
  "benchmark_run[environment]": "2.2.3p173",
  "runs": [
    {
      "benchmark_type[category]": "caching on: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 664.871,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1853
    },
    {
      "benchmark_type[category]": "caching off: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 613.62,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1853
    },
    {
      "benchmark_type[category]": "caching on: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 752.267,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1390
    },
    {
      "benchmark_type[category]": "caching off: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 692.498,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1390
    }
  ]
}

Number vary somewhat over runs but differences are consistent.

Environment

ActiveModelSerializers Version 0.10.0.rc4, on ref fa0bc95
ruby -e "puts RUBY_DESCRIPTION"
- ruby 2.2.3p173 (2015-08-18 revision 51636) [x86_64-darwin14]

OS Type & Version:

uname -a
- Darwin mbp14 14.5.0 Darwin Kernel Version 14.5.0: Tue Sep 1 21:23:09 PDT 2015; root:xnu-2782.50.1~1/RELEASE_X86_64 x86_64: Yosemite 10.10.15 Integrated application and version
bundle show activemodel
- .bundle/ruby/2.2.0/gems/activemodel-4.0.13

Backtrace

N/A

Additional helpful information

https://blog.codeship.com/building-a-json-api-with-rails-5/
- By making these changes, we’ve changed our response time from 30ms to 50ms… wait, what? Yes, you heard me right. By adding cache, responses in my application have actually slowed down.
- https://twitter.com/leighchalliday/status/642734572703236096 and https://twitter.com/joaomdmoura/status/642801896231727104
- By looking at the flame graph with caching turned on, I could tell that 48 percent of the time was spent in the cache_check method or farther down in the stack trace. This seems to account for the slowdown from 30ms to 50ms. active_model_serializers-258f116c3cf5/lib/active_model/serializer/adapter.rb:110:incache_check’` (48 samples - 48.00%) Here’s an image of the flamegraph, which was produced by using rack mini profiler gem with the flamegraph gem. I’ve highlighted in black the portion that’s dealing with the cache.

Cache developments since then:

We now support read_multi

However:

before:

{
  "commit_hash": "43312fa^",
  "version": "0.10.0.rc3",
  "benchmark_run[environment]": "2.2.2p95",
  "runs": [
    {
      "benchmark_type[category]": "caching on: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 687.045,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1426
    },
    {
      "benchmark_type[category]": "caching off: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 688.588,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1426
    },
    {
      "benchmark_type[category]": "caching on: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 849.889,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1084
    },
    {
      "benchmark_type[category]": "caching off: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 769.596,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1084
    }
  ]
}

after:

{
  "commit_hash": "43312fa",
  "version": "0.10.0.rc3",
  "benchmark_run[environment]": "2.2.2p95",
  "runs": [
    {
      "benchmark_type[category]": "caching on: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 635.297,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1519
    },
    {
      "benchmark_type[category]": "caching off: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 601.3,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1519
    },
    {
      "benchmark_type[category]": "caching on: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 782.07,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1113
    },
    {
      "benchmark_type[category]": "caching off: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 771.094,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 1113
    }
  ]
}

So maybe we should take a look at usage in bulk_cache_fetcher

And with more objects it gets worse:

BENCH_STRESS=true bin/bench Benchmark results:

{
  "commit_hash": "e03c5f5",
  "version": "0.10.0.rc4",
  "benchmark_run[environment]": "2.2.3p173",
  "runs": [
    {
      "benchmark_type[category]": "caching on: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 164.688,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 10755
    },
    {
      "benchmark_type[category]": "caching off: caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 143.719,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 10755
    },
    {
      "benchmark_type[category]": "caching on: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 232.669,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 6690
    },
    {
      "benchmark_type[category]": "caching off: non-caching serializers: gc off",
      "benchmark_run[result][iterations_per_second]": 211.71,
      "benchmark_run[result][total_allocated_objects_per_iteration]": 6690
    }
  ]
}

Possibly related

Flamegraph

Flamegraph of master with bin/serve_benchmark start and the flamegraph gem

iff --git a/Gemfile b/Gemfile
index 3791eef..7be3d53 100644
--- a/Gemfile
+++ b/Gemfile
@@ -39,6 +39,8 @@ gem 'tzinfo-data', platforms: (@windows_platforms + [:jruby])
 group :bench do
   # https://github.com/rails-api/active_model_serializers/commit/cb4459580a6f4f37f629bf3185a5224c8624ca76
   gem 'benchmark-ips', require: false, group: :development
+  gem 'rack-mini-profiler', require: false
+  gem 'flamegraph'
 end

 group :test do
diff --git a/test/benchmark/app.rb b/test/benchmark/app.rb
index ae110ec..ffbc8cc 100644
--- a/test/benchmark/app.rb
+++ b/test/benchmark/app.rb
@@ -54,6 +54,14 @@ end

 require 'active_model_serializers'

+begin
+    require 'rack-mini-profiler'
+rescue LoadError # rubocop:disable Lint/HandleExceptions
+else
+  require 'flamegraph'
+  # just append ?pp=flamegraph
+end
+
 # Initialize app before any serializers are defined, for running across revisions.
 # ref: https://github.com/rails-api/active_model_serializers/pull/1478
 Rails.application.initialize!

About this issue

Original URL
State: open
Created 8 years ago
Comments: 35 (15 by maintainers)

Most upvoted comments

Apparently @joaomdmoura had already discussed this in https://github.com/rails-api/active_model_serializers/issues/1020. I missed this since the issue title was ‘Understanding caching’, but the contents were that caching made things worse. So, this has been a known issue since July 2015. Sigh.

bf4 on Apr 20, 2016

I created a simple app to exercise the issue I was facing. https://github.com/customink/amstest

metaskills on Apr 3, 2017

I do see “[active_model_serializers] Cache read_multi: [” …entries from dalli memcached …"] in my output, so I’m assuming this means that its performing multi_read. Perhaps, the outstanding issue is something like Russian Doll strategy for caching where AMS would cache both individual entries and the entire response.

harbirg on Apr 16, 2017

Yeah I can work on that at some point @bf4 - thinking about this further, that issue definitely doesn’t have anything to do with cache performance (obviously actually).

I will say that I’ve personally encountered this now. I replaced an existing route that did not use AMS and that had cache hit rates of ~0.66 and whose average request time was 250 ms with AMS. My AMS implementation has as cache hit rate of ~0.86 but the average request time actually degraded to about 450 ms.

I decided to continue using AMS for serialization, but to hit the cache myself much more directly:

serializers = user.resources.reduce({}) do |memo, resource|
  s = ActiveModelSerializers::SerializableResource.new(resource, options: options)
  memo[s.object_cache_key] = s
  memo
end

results = Rails.cache.fetch_multi(*serializers.keys) do |missing_key|
  serializers[missing_key].as_json
end.values

render json: results

By actually skipping AMS for the cache-lookup and render steps I got my average response time down to 190 ms. So - anecdotally - this looks like an issue of AMS doing too much just to compute cache keys and read them. As indicated by the first poster.

ledhed2222 on Jul 2, 2017

I just noticed something that seems fishy here. I have a resource and am using the Json serializer. I set things up in the model so that I would preemptively write the cache when changes are made, so that the next user request always reads from cache. After source diving it appeared that AMS adds the adapter name to the end of the cache key, which makes sense. So for example, it might look like this:

resource/id-updated_at/json

I found however that I would still get cache misses, because AMS is actually trying to read from:

resource/id-updated_at/attributes

all the time. This is happening because the Json adapter is a subclass of Attributes. Json’s implementation of serializable_hash is:

def serializable_hash(options = nil)
  options = serialization_options(options)
  serialized_hash = { root => Attributes.new(serializer, instance_options).serializable_hash(options) }
  serialized_hash[meta_key] = meta unless meta.blank?

  self.class.transform_key_casing!(serialized_hash, instance_options)
end

Attributes then calls the serializer’s serializable_hash method, passing itself as the adapter:

# attributes:7
serialized_hash = serializer.serializable_hash(instance_options, options, self)

#serializer:356
def serializable_hash(adapter_options = nil, options = {}, adapter_instance = self.class.serialization_adapter_instance)
  adapter_options ||= {}
  options[:include_directive] ||= ActiveModel::Serializer.include_directive_from_options(adapter_options)
  resource = attributes_hash(adapter_options, options, adapter_instance)
  relationships = associations_hash(adapter_options, options, adapter_instance)
  resource.merge(relationships)
end

#serializer: 385
def attributes_hash(_adapter_options, options, adapter_instance)
  if self.class.cache_enabled?
    fetch_attributes(options[:fields], options[:cached_attributes] || {}, adapter_instance)
  elsif self.class.fragment_cache_enabled?
    fetch_attributes_fragment(adapter_instance, options[:cached_attributes] || {})
   else
    attributes(options[:fields], true)
  end
end

#caching:220
def fetch_attributes(fields, cached_attributes, adapter_instance)
  key = cache_key(adapter_instance)
  cached_attributes.fetch(key) do
    fetch(adapter_instance, serializer_class._cache_options, key) do
      attributes(fields, true)
    end
  end
end

All this means, as far as I can tell, that the cache is always reading from the Attributes version. I wonder if that could be the/an issue?

ledhed2222 on Jun 26, 2017

Thanks @harbirg, your tests are correct, that’s what I’m seeing and as you can see, using caching is way much slower than not using it. I’m trying to put more tests/benchmarks in place to help here.

mustela on Apr 25, 2017