Index: sparse_sample/damkjer_sparse_sample.tex
===================================================================
--- sparse_sample/damkjer_sparse_sample.tex	(revision 1)
+++ sparse_sample/damkjer_sparse_sample.tex	(revision 2)
@@ -10,6 +10,8 @@
 %%
 %% Revision History:
-%%    2012-JAN-01  K. Damkjer
+%%    2013-JAN-01  K. Damkjer
 %%       Initial Draft.
+%%    2013-APR-19  K. Damkjer
+%%       Expanded on Introduction and Local Statistic Attribution sections.
 %%=============================================================================
 
@@ -18,5 +20,5 @@
 %% publication is targeted.
 %%***
-\documentclass[11pt,letterpaper]{article}
+\documentclass[10pt,letterpaper,twocolumn]{article}
 
 %%***
@@ -26,4 +28,7 @@
 \usepackage{amsfonts}
 \usepackage{amssymb}
+\usepackage{bm}
+\usepackage{graphicx}
+\usepackage{latexsym}
 
 %%***
@@ -52,13 +57,21 @@
 %%% content -- one to several paragraphs which describe the work.
 \begin{abstract}
+%TODO Add Abstract.
 To do: Add abstract.
 \end{abstract}
 
 \section{Introduction}
-LiDAR mapping systems have evolved significantly over the last couple decades to include linear mode, Geiger mode, and full waveform collection formats. These collection formats are capable of producing a staggering amount of information-rich true 3D data. Modern collection systems are capable of sampling several thousand to over a million points per second at sub-meter resolutions resulting in several million to billions of point samples to be stored, processed, analyzed and distributed.\cite{Parrish:2012,Smith:2012,Young:2012} Limiting data sets to small areas of interest (AOIs) can mitigate the data volume issues inherent in individual LiDAR point clouds. However, simultaneous user demands for wide area contextual awareness and precise scene information keep data sizing considerations at the forefront of content provider concerns.
-
-Raw LiDAR data taken directly from the collection system is usually in a compact vendor-specific proprietary binary format that must be converted into a common public format to facilitate information exchange. The conversion to an exchange format has traditionally introduced a significant amount of bloat to LiDAR point sets; however, an industry standard binary exchange format---LASer File Format (LAS)---was introduced by the American Society for Photogrammetry and Remote Sensing (ASPRS) to facilitate data exchange and minimize overhead.\cite{ASPRS:2012} Recent development of a compressed version of LAS---LASzip (LAZ)---has gained wide-spread adoption and offers lossless, non-progressive, streaming, order-preserving compression, random access, and typical compression ratios between 10 and 20 percent of original file size.\cite{Isenburg:2011} While the compression achieved by the LAZ format is significant, further data reduction must be achieved to support users in bandwidth-limited and mobile device environments.
-
-Several approaches have been developed to extract salient information content from dense scene data. West et al\cite{West:2004} introduced structure-tensor eigenvalue based feature classifiers for LiDAR point sets. These basic classifiers were enhanced by Gross and Thoennessen\cite{Gross:2006} to extract strong linear features to support scene modeling from laser data. Demantk\'{e} et al also extend this basic point classification to direct optimal neighborhood scale selection for feature attribution.
+
+Light detection and ranging (LiDAR) systems have developed to support mapping and surveying produce a staggering amount of information-rich true three-dimensional (3D) data. Modern systems sample several thousand to over a million points per second resulting in several million to billions of point samples per product to be stored, processed, analyzed and distributed.\cite{Parrish:2012,Smith:2012,Young:2012}
+
+Managing such large data sets through a production pipeline consisting collection, processing, analysis, storage and dissemination presents a host of challenges to content providers. Limiting data sets to small areas of interest (AOIs) can mitigate data management issues inherent in processing and storing individual LiDAR point clouds. However, user demands for simultaneous wide area coverage and precise scene content keep data sizing considerations at the forefront of content provider concerns. Further, AOIs do not address the ultimate storage constraints imposed on processing and archival systems that must maintain large libraries of raw and processed data sets.
+
+Raw LiDAR data taken directly from the collection system is usually in a compact, vendor-specific, proprietary binary format that must be converted into a common public format to facilitate analysis and information exchange. The conversion to an exchange format has traditionally inflated the size of the raw LiDAR data holdings. A significant cause of inflation is the conversion of 1D range data into 3D spatial data. Further inflation and loss of precision may result depending on the data types and structures used to represent the 3D point clouds.
+
+An industry standard binary exchange format---LASER File Format (LAS)---was introduced by the American Society for Photogrammetry and Remote Sensing (ASPRS) to facilitate data exchange and minimize overhead.\cite{ASPRS:2012} Recent development of a compressed version of LAS---LASzip (LAZ)---has gained wide-spread adoption and offers lossless, non-progressive, streaming, order-preserving compression, random access, and typical compression ratios between 10 and 20 percent of original file size.\cite{Isenburg:2011} 
+
+While the compression achieved by the LAZ format can be significant, achieving optimal results requires that the uncompressed input data stream be aliased to a regular point spacing. This operation may not be an appropriate modification of the data stream in all scenarios. Even with an effective compression strategy, additional data reduction may be necessary to support users in bandwidth-limited and mobile device environments or to support efficient querying and comparison of data holdings in processing and archival systems.
+
+An ideal data reduction algorithm should remove elements from a data set in an information-preserving manner. Several approaches have been developed to identify salient elements in dense scenes. These salience metrics can then be applied to guide the reduction process. West et al introduced structure-tensor eigenvalue based feature classifiers for LiDAR point sets.\cite{West:2004} These basic classifiers were enhanced by Gross and Thoennessen to extract strong linear features to support scene modeling from laser data.\cite{Gross:2006} Demantk\'{e} et al  also extend this basic point classification to direct optimal neighborhood scale selection for feature attribution.\cite{Demantke:2011}
 
 In this paper, we present a method for performing unsupervised sparse sampling of LiDAR point data while preserving scene information content.
@@ -66,42 +79,193 @@
 %TODO Provide overview of paper structure.
 
-\section{Salience Attribution}
-Discuss salience attribution approach here.
-
-\cite{Gross:2006}
-\cite{Gumhold:2001}
-\cite{West:2004}
+\section{Local Statistic Attribution}
+
+Modern LiDAR sensors produce data sets in low-dimensional spaces typically consisting of 3D position and, depending on the operational mode of the detector array, a measure of intensity proportional to the incident signal strength. Avalanche photodiode (APD) detectors operated in Geiger mode are single-photon sensitive and report only time of flight information, while APD detectors operated in linear mode produce an output signal proportional to the detected optical intensity that can be used to augment peak returns or full waveform data.
+
+To guide the sparse sampling process, we propose extending the attribution of the raw feature set using structure tensor principal component based local statistic attribution. These metrics were first developed by West to guide automated target recognition \cite{West:2004} and later extended by Gross and Thoennessen to extract linear features from LiDAR to support automated generation of 3D models.\cite{Gross:2006}
+
+In this section we describe the computation of the principal component based attributes and generalize the attributes to arbitrary dimensional spaces.
+
+\subsection{Locality}
+
+This paper considers the analysis of real-valued multidimensional points, $\bm{x}$. Each point is assumed to be of the following form where $N$ is the set of attribute dimensions for the native point, including spatial dimensions, and $\lvert N\rvert = n$ is the dimensionality of the native point space.
+
+%TODO add footnote on converting boolean and enumerated values to real space for analysis
+
+\begin{align}
+\bm{x} \in \mathbb{R}^{n} \Leftrightarrow
+\bm{x} = \left(\begin{array}{c}x_{1}\\
+             x_{2}\\
+             \vdots\\
+             x_{n}\end{array}\right), x_{1\ldots n} \in \mathbb{R}
+\end{align}
+
+The standard concept of a point cloud database, $\mathcal{D}$, then is simply a set of real-valued multidimensional points of consistent dimension and description.
+
+\begin{align}
+\mathcal{D}\subset\mathbb{R}^{n}
+\end{align}
+
+We establish a database of query points, $\mathcal{Q}$, where $M\subseteq N$ is the search space of attributes for the determination of locality and $\lvert M\rvert = m$ is the dimensionality of the search space. In some cases, it may be desirable to restrict queries to a subset of the available native dimensions (\textit{e.g.}, spatial dimensions).
+
+\begin{align}
+\mathcal{Q}\subset\mathbb{R}^{m}
+\end{align}
+
+Our analysis is performed on neighborhoods of points from the point cloud database about the query points, $\mathcal{V}_{\bm{q}}$. Note that there is no restriction on the query points, $\bm{q}$,  to be in the point cloud itself. Though, in practice, we typically perform the analysis treating each  $\bm{x}\in\mathcal{D}$ as a query location. This approach effectively results in the need for a reasonable all nearest-neighbor (ANN) search.
+
+\begin{align}
+\mathcal{V}_{\bm{q}}\subseteq\mathcal{D}
+\end{align}
+
+We investigated two neighborhood definitions and each presents merits. The $k$-nearest neighborhood, $\mathcal{V}^{k}_{\bm{q}}$, consists of the $k$ closest points to $\bm{q}$ in $\mathcal{D}$ whereas the fixed-radius neighborhood, $\mathcal{V}^{r}_{\bm{q}}$, consists of all points within the ball of radius $r$ centered at $\bm{q}$ in $\mathcal{D}$.
+
+\begin{equation}
+\begin{aligned}
+\mathcal{V}_{\bm{q}}^{k}=\left\{\mathcal{A}\subseteq\mathcal{D}\right.:&\left.\bm{q}\in\mathcal{Q},k\in\mathbb{N},\lvert\mathcal{A}\rvert=k,\right.\\
+&\left.\left(\forall\bm{x}\in\mathcal{A},\bm{x}^{\prime}\in\mathcal{D}\setminus\mathcal{A}\right)\right.\\
+&\left.\left(d\left(\bm{x},\bm{q}\right)\leq d\left(\bm{x}^{\prime},\bm{q}\right)\right)\right\}
+\end{aligned}
+\end{equation}
+
+\begin{align}
+\mathcal{V}_{\bm{q}}^{r}=\left\{\bm{q}\in\mathcal{Q},\bm{x}\in\mathcal{D}:d\left(\bm{x},\bm{q}\right)\leq r\right\}
+\end{align}
+
+ We discuss the relative merits and demerits of each neighborhood selection strategy later in the paper.
+ 
+ \subsection{Density}
+
+Description of density metric and interpretations.
+
+\subsection{Principal Component Analysis}
+
+Data matrix:
+
+\begin{align}
+\bm{X}=\left[\begin{array}{ccc}\bm{x}_{1}&\cdots&\bm{x}_{n}\end{array}\right],\forall\bm{x}\in\mathcal{V}_{\bm{q}}
+\end{align}
+
+Covariance matrix:
+
+\begin{align}
+\bm{C}=\frac{1}{k-1}\bm{X}\left(\bm{\mathrm{I}}_{k}-\frac{1}{k}\bm{\mathrm{J}}_{k}\right)\bm{X^{\mathsf{T}}}
+\end{align}
+
+Alternatively (more computationally efficient):
+
+\begin{align}
+\bar{\bm{x}} = \frac{1}{k}\sum_{p=1}^{k}{\bm{x}_{p}}
+\end{align}
+
+\begin{align}
+\bm{M}=\left[\begin{array}{cccc}
+\left(\bm{x}_{1}-\bar{\bm{x}}\right)&
+\left(\bm{x}_{2}-\bar{\bm{x}}\right)&
+\cdots&
+\left(\bm{x}_{k}-\bar{\bm{x}}\right)
+\end{array}\right]
+\end{align}
+
+\begin{align}
+\bm{C}=\frac{1}{k-1}\bm{M}\bm{M^{\mathsf{T}}}
+\end{align}
+
+Perform Eigendecomposition:
+
+\begin{align}
+\bm{C}=\bm{U}\bm{\Lambda}\bm{U^{\mathsf{T}}}
+\end{align}
+
+\begin{align}
+\bm{U}=\left[\begin{array}{cccc}
+\bm{e}_{1}&\bm{e}_{2}&\cdots&\bm{e}_{n}
+\end{array}\right]
+\end{align}
+
+\begin{align}
+{\bm{\Lambda}}=\left[\begin{array}{cccc}
+\lambda_{1}&0&\cdots&0\\
+0&\lambda_{2}&\cdots&0\\
+\vdots&\vdots&\ddots&\vdots\\
+0&0&\cdots&\lambda_{n}
+\end{array}\right]
+\end{align}
+
+\begin{align}
+{\bm{\Lambda}}=\bm{\Sigma}\bm{\Sigma^{\mathsf{T}}}
+\end{align}
+
+\begin{align}
+\bm{\lambda}=\left(\begin{array}{c}\bm{\Lambda}_{1,1}\\
+             \bm{\Lambda}_{2,2}\\
+             \vdots\\
+             \bm{\Lambda}_{n,n}\end{array}\right)=\left(\begin{array}{c}\lambda_{1}\\
+             \lambda_{2}\\
+             \vdots\\
+             \lambda_{n}\end{array}\right)
+\end{align}
+
+\begin{align}
+\bm{\sigma}=\left(\begin{array}{c}\bm{\Sigma}_{1,1}\\
+             \bm{\Sigma}_{2,2}\\
+             \vdots\\
+             \bm{\Sigma}_{n,n}\end{array}\right)=\left(\begin{array}{c}\sigma_{1}\\
+             \sigma_{2}\\
+             \vdots\\
+             \sigma_{n}\end{array}\right)
+\end{align}
+
 
 \subsection{Normal Estimation}
-Perform normal estimation by planar fit to point data.
-
-\begin{align}
-\textbf{x}_{i} = \left(\begin{array}{ccc}x_{i}&y_{i}&z_{i}\end{array}\right)^{T}
-\end{align}
-\begin{align}
-\overline{\textbf{x}} = \dfrac{1}{n}\sum_{i=1}^{n}{\textbf{x}_i}
-\end{align}
-
-
-\cite{Dey:2005}
-\cite{Konig:2009}
-\cite{Mitra:2004}
+Perform normal estimation by planar fit to point data. Resolve ambiguity using sensor location.
+
+Normals\cite{Dey:2005}
+
+Normals\cite{Konig:2009}
+
+Normals\cite{Mitra:2004}
 
 \subsection{Curvature Estimation}
+
 Perform curvature estimation by planar fit to normal data.
 
-\cite{Kalogerakis:2009}
-
-\subsection{Principal Component Analysis}
-Perform attribution via eigenspace analysis.
+Curvature\cite{Gumhold:2001}
+
+Curvature\cite{Kalogerakis:2009}
+
+\subsection{Structure Features}
 
 Structure Tensor features introduced by West\cite{West:2004}:
 
-\begin{align}
-\text{Omnivariance}&=\sqrt[n]{\prod_{i=1}^{n}{\lambda_{i}}}\\
-\text{Anisotropy}&=\dfrac{\lambda_{i} - \lambda_{n}}{\lambda_{1}}\\
-\text{Dimensionality}_{i-1}&=\dfrac{\lambda_{i-1} - \lambda_{i}}{\lambda_{1}}\\
-\text{Isotropy}&=\dfrac{\lambda_{n}}{\lambda_{1}}\\
-\text{Entropy}&=-\sum_{i=1}^{n}{\lambda_{i}\log\left(\lambda_{i}\right)}
+\subsubsection{Isotropy}
+Further discussion of information provided by this classifier.
+
+\begin{align}
+Iso\left(\mathcal{V}_{\bm{q}}\right)&=\frac{\lambda_{n}}{\lambda_{1}}
+\end{align}
+
+\subsubsection{Anisotropy}
+Further discussion of information provided by this classifier. Estimate of degree of structure present in local area feature. Complement of isotropy.
+
+\begin{align}
+Ani\left(\mathcal{V}_{\bm{q}}\right)
+&=\frac{\lambda_{1} - \lambda_{n}}{\lambda_{1}}
+\end{align}
+
+\subsubsection{Dimensionality}
+Further discussion of information provided by this classifier. Estimates the embedded dimension of local area feature.
+
+\begin{align}
+Dim_{d}\left(\mathcal{V}_{\bm{q}}\right)&=\left\{\begin{array}{ll}
+\frac{\lambda_{d} - \lambda_{d+1}}{\lambda_{1}}&,\,d<n\\
+Iso\left(\mathcal{V}_{\bm{q}}\right)&,\,d=n
+\end{array}\right.
+\end{align}
+
+\subsubsection{Low-Dimensional Embedding}
+
+\begin{align}
+Emb\left(\mathcal{V}_{\bm{q}}\right)&=\operatorname*{\arg\!\max}_{d\in[1,n]}\left(Dim_{d}\left(\mathcal{V}_{\bm{q}}\right)\right)
 \end{align}
 
@@ -109,18 +273,30 @@
 Further discussion of information provided by this classifier. Estimates the number of dimensions needed to encode information content of local area feature.
 
-\subsubsection{Dimensionality}
-Further discussion of information provided by this classifier. Estimates the embedded dimension of local area feature.
+\begin{align}
+H\left(\mathcal{V}_{\bm{q}}\right)&=-\sum_{d=1}^{n}{\hat{\lambda}_{d}\log_{n}\left(\hat{\lambda}_{d}\right)}
+\end{align}
 
 \subsubsection{Omnivariance}
 Further discussion of information provided by this classifier. Geometric mean of structure-tensor eigenvalues. Estimates data variance across dimensions of local area feature.
 
-\subsubsection{Anisotropy}
-Further discussion of information provided by this classifier. Estimate of degree of structure present in local area feature.
-
-\subsubsection{Isotropy}
-Further discussion of information provided by this classifier. Complement of anisotropy.
-
-\subsection{Salience Weighting}
-Further discussion of information provided by this classifier. Discuss combination of feature classification into salience weights.
+\begin{align}
+Omnivar\left(\mathcal{V}_{\bm{q}}\right)&=
+\left(\prod_{d=1}^{n}{\lambda_{d}}\right)^{\frac{1}{n}}
+\end{align}
+
+\subsubsection{Fractional Anisotropy}
+
+Further discussion of information provided by this classifier. 
+
+\begin{align}
+FA\left(\mathcal{V}_{\bm{q}}\right)&=
+\left(\frac{n\sum\limits_{d=1}^{n}
+{\left(\lambda_{d}-\bar{\lambda}\right)^{2}}}
+{\left(n-1\right)\sum\limits_{d=1}^{n}{\lambda_{d}^{2}}}\right)^{\frac{1}{2}}
+\end{align}
+
+\section{Salience Weighting}
+
+Discuss combination of feature classification into salience weights.
 
 \section{Tiered Filtering}
@@ -144,5 +320,5 @@
 
 \subsection{Neighborhood Selection}
-Discuss trade of fixed radius versus \textit{k}-neighbor versus dynamic sizing.
+Discuss trade of fixed radius versus $k$-neighbor versus dynamic sizing.
 \cite{Gressin:2012}
 \cite{Sankaranarayanan:2007}
