Let's say I've a page mysite.com/mypage
Landing Page Report in GA for this URL for a specified duration, gives me a number of sessions - say 50.
For the same duration, I checked Apache's access.log, and did a grep "GET /mypage
, and I've got around 10x more hits -- say 500.
How can we have a 10x anomaly between GA & Server Logs? Where did the hits go?
This anomaly is present for other durations too. I've compared various durations.
Before someone tells the standard reasons for this, let me point out that:
- A difference of 2x or 3x is understandable, but not 10x.
- No, this is not Bot Traffic. I extracted all unique IPs from the logs, and the IPs are 99% unique. So the traffic is all coming from different IPs.
- I also analyzed user agents, and they all look real (with various models of phones like iPhone, Samsung etc.)
- GA also says that this report is based on 100% data (sampling ruled out).
- As I pointed out, I'm only counting the GET requests to the
/mypage
. That is, I'm not counting asset downloads, favicon hits etc. etc.
I performed another test. I took all IPs, then made them unique, then for each IP I analyzed how many hits came from that IP. I found from 84% of the IPs, there's no second request. They made only 1 request.
I've read Anamoly between google analytics and server hits and have taken care of everything given in the accepted answer.
What could it be? Any clues on how to debug this? The traffic is coming from Paid Facebook Ads.
I suspect the GA JS isn't loading for those "extra" sessions. If it were loading correctly, there would be one tracking hit sent for each /mypage hit.
The best way to get information on what's happening is to store a local copy of the tracking data sent to Google. When the GA JS loads correctly, there will be one tracking request sent to Google and another tracking request sent to your web server's log file.
Here's an article that explains how to configure your GA JS code to keep a copy of all tracking requests sent to Google, with syntax examples:
http://support.angelfishstats.com/entries/42575637-How-To-Process-Google-Analytics-Data-with-Angelfish
Once you have the requests, you will see a __ua.gif hit for each request sent to Google. If you see a hit to /mypage with a FB referral WITHOUT a corresponding __ua.gif hit, you'll want to investigate the details of the /mypage hit. For example:
If the hits look bogus, use your findings to complain to FB and get some money back.
You are comparing sessions to pageviews, which is not an apples-to-apples comparison.
A session indicates a period of continuous hits (e.g. pageviews, events) before 30 minutes of inactivity. Therefore 1 session could consist of many pageviews.
When you are doing a search for
GET /mypage
in your logs, you are looking at how many times that page was requested from your servers. This is equivalent to the pageview metric in Google Analytics.I recommend you compare
pageviews
for/mypage
againstGET /mypage
entries in your logs. This should give you a much closer comparison.Keep in mind that it will be rare to get a 100% match due to scenarios where the Google Analytics tag may not fire on the user's browser. Examples of scenarios include:
Facebook has some sort of pre-load mechanism on mobile, that fetches data for a lot of external objects just in case the user might want to actually view them.
Apparently that thing is called “Facebook Liger”, check if what is described here matches the requests you’re seeing: http://inchoo.net/dev-talk/magento-website-hammering-facebook-liger/
You should be able to detect this via the User-Agent header, and maybe exclude those requests from your analytics.