Support optional, user-directed collection of performance counters (#1114)

* Support optional, user-directed collection of performance counters

The patch allows an engineer wishing to drill into the root causes
of a regression, for example. Currently, only single threaded runs
are supported. The feature is a build-time opt in, and then a runtime
opt in.

The engineer may run the benchmark executable, passing a list of
performance counter names (using libpfm's naming scheme) at the
command line. The counter values will then be collected and reported
back as UserCounters.

This is different from #240 in that it is a benchmark user opt-in, and
the counter collection is transparent to the benchmark.

Currently, this is only supported on platforms where libpfm is
supported.

libpfm: http://perfmon2.sourceforge.net/

* 'Use' values param in Snapshot when BENCHMARK_OS_WINDOWS

This is to avoid unused parameter warning-as-error

* Added missing include for <vector> in perf_counters.cc

* Moved doc to docs

* Added license blurbs
diff --git a/src/string_util.h b/src/string_util.h
index 09d7b4b..6bc28b6 100644
--- a/src/string_util.h
+++ b/src/string_util.h
@@ -37,6 +37,8 @@
   return ss.str();
 }
 
+std::vector<std::string> StrSplit(const std::string& str, char delim);
+
 #ifdef BENCHMARK_STL_ANDROID_GNUSTL
 /*
  * GNU STL in Android NDK lacks support for some C++11 functions, including