A “Hit” Is Not A “Visit”

But don’t tell the CBC;

Harald Gremel, who headed the Austrian unit of the investigation, said police recorded more than 8,000 visits or “hits” to the Austrian server from 2,361 computer IP addresses within a single 24-hour period. Visitors from 77 countries, ranging from Iceland to South Africa, each paid $89 US to access the illicit material.

28 Replies to “A “Hit” Is Not A “Visit””

  1. The CBC based this story on the Associated Press, so we can’t expect much intelligence to be involved in the reporting, even if it’s not completely false like so many of the AP and Reuters reports have now proven to be.
    What I don’t understand though is what these people are doing getting their jollies from looking a dirty pictures of little children. There must be some problem with the wiring of their brains. I cannot see any advantage that such behaviour provides to the species.

  2. Jack, technically speaking a “hit” is an access to any resource on a webserver – this includes the basic page, plus any images or other separate content files within it. SDA for instance has lots of images and things which means when you load the homepage once, you hit the server many times, sometimes dozens for some pages. Reload the same page and you make more hits to the server. Visit other pages on the site, and that racks up dozens more hits. Thus one actual “visit” by one persone to a website can produce hundreds or thousands of hits in the server logfiles.

  3. “Consider NASA’s James Hansen who complained that he was being interfered with by the Bush Administration which saw Mr. Hansen’s views as inconvenient with respect to their policies on climate change. Dr. Hansen is, by his own admission, outside of the scientific consensus on climate change, as reflected by the IPCC. Should Dr. Hansen’s ability to speak or even hold his job be a function of the political views of the officials who happen to be in office?”
    Roger Pielke Jr. from his Prometheus blog. Feb. 7th. 2007

  4. From en.wikipedia.org/wiki/Web_analytics —
    Hit – A request for a file from the web server. Available only in log analysis. The number of hits received by a website is frequently cited to assert its popularity, but this number is extremely misleading and dramatically over-estimates popularity. A single web-page typically consists of multiple (often dozens) of discreet files, each of which is counted as a hit as the page is downloaded, so the number of hits is really an arbitrary number more reflective of the complexity of individual pages on the website than the website’s actual popularity. The total number of visitors or page views provides a more realisitic and accurate assesment of popularity.
    Page View – A request for a file whose type is defined as a page in log analysis. An occurrence of the script being run in page tagging. In log analysis, a single page view may generate multiple hits as all the resources required to view the page (images, .js and .css files) are also requested from the web server.
    Visit / Session – A series of requests from the same uniquely identified client with a set timeout. A visit is expected to contain multiple hits (in log analysis) and page views.
    Visitor / Unique Visitor – The uniquely identified client generating requests on the web server (log analysis) or viewing pages (page tagging). A visitor can make multiple visits.
    Repeat Visitor – A visitor that has made at least one previous visit. The period between the last and current visit is called visitor recency and is measured in days.
    New Visitor – A visitor that has not made any previous visits.

  5. A hit is recorded for every file downloaded from the server. In any given visit, you might load dozens of files – every image on this page is a separate hit, for example. Thus, when they count “hits” and present them as unique visits, the number of people the reader assumes to have accessed the site (for whatever reason) is inflated – sometimes by several orders of magnitude.
    For example, this month at SDA the internal server stats indicate an average daily:
    Hits 108978
    Visits 9947
    Think of it this way – if you sold 100 copies of a 20 page booklet, would you count that as 2,000 readers?
    There are probably better analogies, and I’m sure someone will provide them. Back to work…

  6. I posted the 3:12 comment on wrong thread . It was meant to be posted with respect to the “Global Warming Gestapo”, in the readers comments section.

  7. There are other web analytics problems too. For example, typically all the readers behind a LAN router will appear to the web server to come from the same IP address, which has the effect of under-counting unique visitors.
    In addition, since TCP/IP is a connectionless protocol, that is, no connection is held open between page requests, there is no way to really tell when a visit ends, because there is no connection dropped to signal same. While you’re reading a page, the server can’t tell whether you’ve left or not. Usually a time-out is used, so that as long as you request at least one page every, say, ten minutes, they’re all considered to be in a single visit. Lowering the time-out will appear to produce more visits.
    Other technologies like cookies can be used to refine accuracy, but in general, unless there is some sort of user id based login and logout procedure, web analytics produces approximate results at best.

  8. TCP is a connection based protocol.
    UDP is a connectionless protocol.
    IP is not a protocol, but packet format.
    HTTP travels over TCP, therefore HTTP requires connection.
    The visits are completely different from the hits, as visit is not a network term, but application term. There may be a visit definied as a session, and web server would have to use some indication that the session has ended, like explicit user initiated sign off or time out.
    Now let’s deviated into the area of types of end users of the Internet. Is it legal for Google bot to visit a child porn site and index and cache its entire content? Does not the Google owner ultimately initiate that automated process? Is it legal for a person to visit a child porn site and store some of its pages in his browser cache? Think about that.

  9. Yes, I was a bit sloppy there, sorry. TCP (transport control protocol) does indeed simulate a connected protocol on top of the IP (internet protocol) layer, but the IP layer is connectionless. For more information see: en.wikipedia.org/wiki/Connectionless
    What I was trying to say is that between a web client and server there is no explicit end-of-visit signal, as there is when, say, a telephone goes on-hook. On the web, I can request a page, and close the window right after getting it, and the server doesn’t get told that. No connection information is provided to the server in-between service requests, only during service requests.

  10. soooo.. there are great liabilities for police blindly trusting IP “hit” registers as evidence of criminal activity?
    I don’t know how many times I’ve gotten onto a porn site from a google keyword search…the IT manager at our company tells me there are lots of times he see’s company computers make “hits” on porn sites but he tells the management not to use it as evidence of wrong doing by an employee…and never to act on it uless they see large file tranfer from these sites on a regular basis or catch porn being stored on company computers…I guess the idea is that anyone can hit these sites by mistake… it doesn’t mean you accessed the site.
    Is this about the jist of it?

  11. There are great risks in relying on “stray” hits to establish mens rea, Redux, including the possibility of “pre-fetching” software that requests pages for you before you even ask for them (in order to deliver them quickly if and when you do).
    Don’t forget, too, that IP addresses can be faked by software on the client side. I’m not sure about the details there, for I’m getting out of my domain. I’m not really a network specialist, I was just trying to help Jack a bit. But I’m pretty sure that it is possible for someone to maliciously simulate requests from someone else.
    As is the case with anything to do with bits, including, say, digital images, it can be very difficult to establish provinence, since bits are handled by software, and the whole purpose of software is to change and transfer bits.
    When it comes to a matter of “beyond a reasonable doubt” in criminal cases, I think most of us would agree that if an IP address that serves a single machine is making repeated page requests to a server, over time, that something’s going on with respect to that machine, unless there is reason to doubt the accuracy of the IP address, in which case, that’s what defense lawyers are for.

  12. Hey there. Aaron has a good point. When Newsgroups were all the rage and when some newsgroups held photos that were clearly illicit, and the local telephone company was the internet provider… and they downloaded and cached those newsgroups, 55,000 or more …were they not in possession of illegal material ?
    Is it true that a guy in Toronto was sent one e-mail with child porn and he ended up being charged even though he didn’t know it was on his machine ?
    or is that Urban legend.
    So is it possible to charge someone who may have had illegal porn on his computer unbeknownst to him ?

  13. WL is right and the topic he brought up opens another whole can of worms: the linking. A web site with absolutely innocent content has ways to hit any other site (be it porn or casino or whatever for that matter) without the visitor knowing. There are endless possibilities for the web site owners to participate in such schemas for different reasons.
    My wife knows the best: she became exposed to all kinds of porn as soon as I showed her how to use Google. Every blasted site she tried to open popped up dozens of junk sites and planted trojans and bots on her machine. I then had endless hours of ‘fun’ cleaning up her machine. I still have no idea whether my LAN is clean or not, hopefully, judging by lack of traffic on the switch while there is no browser activity on any of the machines and all the IM programs are down… Who knows, maybe there is SWAT on the way to my place as my IP are flagged somewhere in the world…

  14. In the west, classic telephony networks have enjoyed common carrier status: they were shielded from lawsuits based on content, which was considered the sole responsibility of the end users. And this made sense, because, certainly in analog days, the networks did not and could not examine the content passing through them on a regular basis (modulo your local busy-body operator in the really old days).
    Once the data is in digital form passing through software that can look at the bits, questions of public policy arise as to whether and how it should, and to what extent the carriers and other owners of software should be responsible for having it do so and liable for not having it do so.
    Of course, since public policy makers are famously ignorant of technology, the results are likely to at best be ugly. The whole network neutrality matter is similar, in that some of the things it provides are good, and some of them aren’t, and the legislative regulators are probably about as likely to keep the good parts and chuck the bad as they are to do the opposite.

  15. There’s something I think we should keep in mind here. Yes it is possible to accidently hit dirty picture sites, indeed, I’ve stumbled across some in my time. (Fourtunately, since I run a strict Linux box, my software’s never caught anything as a result.)
    But to the best of my knowledge those over-zelous commercial pop-up sites have never presented images of young children, at least not on any pages I’ve seen. And dirty pictures between consenting adults is not and I do not think should be considered illegal.
    The important thing is that, to the degree that there are valid charges behind the story originally linked to by Kate, it would appear that there was quite some deliberate effort by some to access the particular content of a particular site, including the payment of a fee.
    In which case, we are talking, legally, of the concept of accessory to child abuse, which is and should be considered by the law to be just as bad as child abuse itself. I suspect the lawyers of those who are actually guilty are advising them to try to cut a deal.

  16. Aaron said: “My wife knows the best: she became exposed to all kinds of porn as soon as I showed her how to use Google. Every blasted site she tried to open popped up dozens of junk sites and planted trojans and bots on her machine. I then had endless hours of ‘fun’ cleaning up her machine”
    >> Ohhh man do I hear ya on that 😉 My wife got some bot planted on the system that redirected her to porn sites from links to legit sites she went to all the time with cued pop up windows and they shagged her computer/IP addy ID and we (it was the family computer) started getting porno spam mail….man did I have some ‘splainin’to do as she thought either I or the boy were surfing porn sites….after learning about these bots, trojans and spy-ware that were loading through our lame firewall…and spending a fortune and time to remove them, we learned to be more careful with google searches….all the same we still get this porn spam mail and the odd redirect….it’s a sick planet out there :-0

  17. Hopefully Canadian police are a bit more computer literate than the Autrian Gerald Hesztera who seems to have difficulty distinguishing between hits and visits. 8000 visits to a web site during a day is not that many, especially when one gets hit with some of the email harvesting bots that will crawl your whole site several times in a day.
    Relying on an IP address is very dicey because the IP address recorded at the porn site is likely not the IP address of the person who was downloading porn. Before I password protected my wireless network, I caught more than one person using it to download porn (stuck a packet sniffer on my internet connection to see what people would use on open wireless access point for). Also, if people use IP anonymizing software to access web sites, then the connection may be made through a number of computers making the originating IP impossible to trace. There are a number of such packages one can run on ones computer which were originally designed to let people in dictatorships access sites to which direct access is blocked. I was running such software for a while until my girlfriend got onto me about my potential liability if my IP address came up in a child porn investigation.
    While I can’t understand why anyone is interested in child porn, I see the laws banning possession of child porn as a smokescreen for governments to exert greater control over the internet. After all, porn has always been with us and there is a huge difference between viewing a picture and actually abusing a child (a simple way to deal with pedophiles is just to stick them in with the rest of the prison population who will readily solve the problem of their existence).

  18. A couple minor things, Redux. Your infected machine didn’t use your IP address to send you mail, it shagged your email address, probably from your address book database. That’s why you can’t trust mail from your friends (especially if they run Windows), because if they are infected, they can send you bad mail masquerading as from your friend.
    The other thing is, even good firewalls aren’t designed to and can’t prevent bad stuff from coming though channels you have enabled, such as that for your browser. If you allow in an image, and that image causes a buffer overflow due to a coding error in the browser, and that overflow causes CPU instructions to be executed out of some carefully placed data in the image, then your machine is now running software you never asked for, and at the lowest level of the machine too.
    This all sounds scary perhaps, but to a degree it’s like driving a car. You not only have to avoid the pot holes, you also have to avoid the bad parts of town, and you have to avoid getting hi-jacked. Fortunately, with cars, there’s a immediacy to it.
    With computers though, it’s more subtle. Over time, the combination of accumulated human experience, better technology, and public policy changes will mitigate the situation. But the bad guys never have and never will go away.
    Here’s a tip: if you use a major service provider to host your inbound mail, go to their web site and turn on their checking for your account. I use Shaw for the final link from all my forwards, and I have it set to they mark suspected junk mail and then send it trough, and I have a filter set in my mail reader so all such marked mail automatically goes to a junk folder. I get about 300 inbound a day, and about 96.7% of them are flagged.

  19. I think that it was a CTV story, but I read somewhere today that Garth Turner’s blog gets 1 million hits a month!

  20. OK just so I got it straight – the 4.250,000 number on your site, Kate. Is that “hits” or visits?

  21. Vitruvius at February 7, 2007 4:26 PM
    Right you are on the software.
    By using Torpark (3w.torrify.com) you surf the web anonymously and show a spoofed IP address to the host site. You can also load the executable to a flash stick & run it from any pc, surfing anonymously from internet cafes, libraries, etc.
    I’ve just posted this using Torpark, which indicates I’m in Malaysia… and that just ain’t so.

  22. this ‘hits’ vs ‘visits’ reminds me of the bad old days of comparing hard drive access times; the liars in charge would include colossal memory buffers which would contain the data being read numerous times as opposed to an actual physical fetch from the drive surface and then fob off the spectacular results on an unsuspecting public..
    slanted, tilted, fixed, rigged blatant bullshyt.

Navigation