Techniques to Reduce Buried Pipe Point Cloud Size to 1/10｜Classification, Thinning, and Attribute Assignment

By LRTK Team (Lefixea Inc.)

Survey-Grade Accuracy. Smartphone Simplicity.

See Underground Utilities Before You Dig Again.

In maintenance and construction record situations for buried pipelines, there is a growing number of cases in which underground pipes are measured with laser scanners or photogrammetry and recorded as three-dimensional point cloud data. While point cloud data can preserve the shape and position of pipes with high accuracy, they are large collections of points and thus file sizes tend to become very large. Left as raw data, opening them on a PC can take time and sharing with stakeholders can be difficult, so in practice it is important to efficiently organize and downsize point cloud data. This article explains techniques for reducing buried-pipe point cloud data to about one-tenth of its original size. Specifically, we cover methods such as point cloud classification (class division) and thinning (downsampling), noise removal, information organization through attribute assignment, and optimization of file formats, and we detail key points that practitioners should know to make large point cloud datasets easier to handle. Learn the tips to streamline data without wasting it while securely retaining necessary information, and smoothly advance the effective use of buried pipeline point clouds.

Challenges and the Need to Organize Buried Pipe Point Cloud Data

Point cloud data of buried pipes often consist of a massive number of points, numbering tens of millions or more. For example, if a trench excavated for water, sewer, or gas pipeline installation is scanned at high density, you can obtain point clouds of millions of points within an area of a few meters (a few ft) square, and file sizes can easily range from hundreds of MB to several GB. Compared with conventional 2D drawings and photographs, the amount of information is orders of magnitude greater, so the raw data are often too large to handle easily.

If the file size remains large, it can take a long time to open in specialized software, and transferring it over the company network for sharing can also be slow. There are also problems such as the data being too heavy to display smoothly on a tablet for on-site checks. In addition, point clouds are just collections of points, so if they are not organized it can be hard to tell at a glance what is ground and what is piping. Even if you only intend to view the necessary parts, if the point cloud includes all the surrounding soil and structures, finding the target piping can be a major hassle.

For these reasons, post-acquisition "organization" of buried-pipe point cloud data is indispensable. "Organization" does not simply mean reducing file size; it refers to structuring the data, assigning meaning, and shaping it into a form that is easy to use in practice. Specifically, this includes classifying points by type such as pipes and ground surface and removing unnecessary parts; reducing point density through thinning or downsampling to a sufficient and necessary number of points; removing noise such as outliers and mismeasured points to improve accuracy; and attaching pipe metadata as attributes to the point cloud. Finally, by re-saving the data in an efficient file format, the result is a compact, streamlined dataset. Applying these organization techniques can greatly compress the original raw point cloud data, in some cases reducing it to about one-tenth of its original size. Even so, requirements such as pipe location and shape are properly preserved, producing a dataset robust enough for practical use.

Below, let's go through each method for streamlining and organizing buried-pipe point cloud data.

Improving Efficiency through Classification of Point Cloud Data

Classification (class labeling) of point cloud data is the process of assigning labels such as "ground", "piping", "structures", and "noise" to each point or region within a point cloud. When point clouds are acquired as-is, points from various objects—not only underground buried pipes but also the surrounding ground surface, temporary installations, heavy equipment, people, and so on—are often mixed together, which increases data volume and reduces visibility. Therefore, by first classifying and organizing the point cloud by type, you can separate necessary point data from unnecessary point data, greatly streamlining subsequent workflows.

For example, point cloud data scanned at an open-cut excavation site may include not only the point cloud of the buried pipes themselves, but also points of the ground surface and soil, points of structures such as scaffolding and enclosures, and even points of temporary objects such as people and vehicles. Depending on the intended use, points other than the piping are often unnecessary. By performing classification processing, you can group the data like “these are ground surface points, these are pipe points, these are other structures…”. For example, if you extract only the point cloud data of the “pipe class,” you can remove all non-pipe points at once. By leaving only the point cloud of the necessary parts (the pipes) and removing unrelated parts, the data volume will be greatly reduced. Furthermore, ground point clouds may be omitted if a separate terrain model exists, and points of temporary construction fixtures and people can be discarded first, as they are unnecessary for future use. Classification enables selective retention and deletion according to purpose, making point cloud data leaner and more efficient.

Classification of point clouds is increasingly supported by automated features in recent point cloud processing software. For example, using an automatic detection feature for the ground surface, the ground can be mechanically classified from non-ground. For piping, segmentation functions that recognize and extract piping-like cylindrical shapes are available, and automatic classification technologies that train AI to automatically identify piping point clouds are beginning to appear. By adopting a workflow that performs bulk classification with such automated processing and only manually corrects the insufficient parts, efficient class assignment is possible. Of course, you can also manually select points in a point cloud viewer and label them "piping", "ground", etc. Although it requires some effort, once classification information has been added, subsequent processes and information sharing with other departments become dramatically easier.

The benefits of classification are not limited to reducing storage. A major advantage is that it provides a foundation for attaching attribute information. By giving each point an indication of “what kind of point this is,” the point cloud data acquires meaning. For point clouds of buried pipes, you can group points by pipe and assign identifiers or type information such as “Water Pipe A” or “Gas Pipe B.” Once in that state, adding metadata described later (such as diameter and material) becomes smooth, and flexible visualization—such as coloring only the pipes in the viewer—is possible. Conversely, unclassified point clouds are just blobs of points, making it difficult to extract only the necessary parts or to add information. Therefore, when organizing buried-pipe point clouds, it is important to perform classification as the first step to give the data structure.

Reducing storage size by thinning point cloud data

After removing unnecessary parts by classification, the next step is to thin out (downsample) the remaining point cloud to further reduce storage. The "weight" of point cloud data depends heavily on the number (density) of points. Point clouds scanned at high density are very detailed, but their file sizes are correspondingly large. However, it is not always necessary to retain every point, and in most cases appropriately reducing the number of points still sufficiently represents the shape and position of the piping. Therefore, thinning is performed so as to minimize degradation of the data's geometric features, drastically shrinking storage by reducing the number of points in the point cloud itself.

There are several methods for downsampling, but the representative ones are voxel grid (spatial sampling) and uniform sampling. Voxel grid downsampling divides space into a cubic grid of a fixed size (voxels), keeps only one representative point per voxel, and deletes the other points. For example, if you set a "5 cm (2.0 in) grid", the point cloud is compressed for each 5 cm (2.0 in) cube so that only one point remains in each cube. This thins locally dense point clusters and results in a more uniform point spacing overall. The larger the voxel size, the more the number of points is reduced and the greater the storage reduction effect (however, fine shape details will be lost). On the other hand, uniform sampling selects points from the original point cloud at a fixed ratio. For example, you might "randomly keep 10% of all points" or "pick points at regular intervals according to the measurement order." This is easy to implement, but spatial uniformity is not guaranteed, so in some places points may become too sparse or remain too dense. In general, the voxel grid method is a more suitable downsampling method for representing shape evenly.

For buried-pipe point clouds, it is often sufficient to retain a point density that allows the pipe diameter and curvature to be understood. For example, for a pipe with a diameter of 200 mm (7.87 in), a few dozen points around the pipe’s circumference are enough to capture its shape. If the source data, however, contains thousands of points on that pipe surface, that is excessive information. By thinning the point cloud—appropriately increasing the spacing between points—you can reduce the data to a volume that is manageable for both humans and machines. Specifically, if the original point spacing is a few millimeters (a few tenths of an inch), downsampling using a voxel grid of around 1 cm (0.4 in) or 5 mm (0.20 in) should still preserve the pipe’s outline. Conversely, if downsampling is insufficient, the point cloud may look almost the same while remaining in a “wasteful” state with an unnecessarily large data size.

In practice, it is important to adjust the downsampling rate according to the required level of accuracy. When using the data for high-precision analysis or as-built verification, coarse downsampling is a no-go, but if it is only for rough position logging or route planning, it is often acceptable to aggressively coarsen the point cloud. For example, leaving only one-tenth of the points can often be enough to grasp the location and general shape of piping. In fact, there are cases where reducing the original point cloud to less than 10% of the points posed no practical problem. Because downsampling reduces data volume roughly in proportion to the percentage of points removed, reducing to 1/10 of the points will make the storage about 1/10 as well. If you combine downsampling with removal of unnecessary points through classification, you can produce a dataset that is significantly lighter than the original raw data.

Note that downsampling is also provided as a standard feature in recent point cloud editing software. Under names such as "Change point cloud density" or "Sampling tool", you can downsample point clouds at a specified interval or percentage. For example, if you perform an operation like "set the point spacing to 50 mm (1.97 in) and downsample", the points will automatically be reduced so that points across the entire area are roughly at least 50 mm (1.97 in) apart. If a preview function is available, it’s good to adjust while checking the result. The important thing is to reduce to a point density that matches your purpose. Handling unnecessarily high-density point clouds is a waste of computational resources, so by reducing to an optimal granularity you can achieve both reduced storage and improved processing efficiency.

Clean Your Data with Noise Removal

Point cloud data almost always contains noise points and outliers. Noise refers to points that were mistakenly measured at positions where nothing should actually exist, or points obtained from reflections off extraneous objects that are not the target. Even in point clouds of buried pipes, various types of noise can be introduced depending on the measurement conditions. For example, people or vehicles passing during a scan can leave residual points in some areas; strong sunlight or reflections can cause sensors to misdetect and produce points appearing in midair; or in the case of ground-penetrating radar, false responses can occur due to metal fragments or geological noise. These noise points are unnecessary data that do not represent the real object, and leaving them will only obstruct analysis and increase data size. Therefore, during the point cloud cleaning process we thoroughly remove noise (cleaning) to keep the data clean.

Noise removal begins by deleting points that are obviously wrong. When you visualize a point cloud in a 3D viewer, you may occasionally find isolated points floating away from the main cluster or very small clumps of points. Those are usually noise, so manually selecting and deleting them is the most reliable approach. Also, in the case of terrestrial laser scanning, there are often instances where points are scattered at positions clearly higher than the ground surface (points appearing in mid-air where there is actually nothing). These are also noise points caused by reflections or multipath (multiple reflections), so use range selection to delete them in bulk. For buried-pipe point clouds, check whether there are concentrated groups of points at greater depths below the piping. If points appear in soil where there should be no cavities or anything, it is likely a false detection, and leaving them can create the mistaken impression that “there is something underground.” Remove them promptly and eliminate them from the data.

Next, we also make use of automatic noise filters that employ algorithms. Typical examples are methods called statistical outlier removal and radius-based outlier removal. Simply put, the process detects and removes isolated points that have almost no other points nearby. For example, we apply a filter that treats a point as noise and deletes it if it has fewer than N adjacent points within a radius of ○ cm (○ in). Points representing actual piping or ground tend to form clusters and are therefore unlikely to be isolated, whereas noise points often appear scattered and separated; with this criterion you can eliminate a considerable number of unnecessary points. Parameters such as N and the radius need to be adjusted according to the data density, but dedicated software or libraries will often provide recommended values and previews when run. By mechanically identifying and removing noise with such statistical techniques and deleting them en masse, you can remove even fine noise that is likely to be overlooked in manual work.

Thorough noise removal also contributes to reducing data size, because removing unnecessary points decreases the number of points. However, the primary purpose of noise removal is to improve data quality rather than to save storage. If noise remains, it can cause false detections in downstream processes such as pipe modeling or cross-section creation, and when displaying the point cloud it can produce a “haze” of points that makes it hard to see. With a cleaned point cloud, pipe shapes stand out clearly and analysis accuracy improves. Therefore, in addition to reducing data size, noise removal is a necessary step for increasing accuracy and producing reliable data.

From a practical standpoint, it is recommended to perform noise removal in parallel with classification and thinning. Points that are clearly noise at the classification stage should be excluded from the start, and it is efficient to proceed by roughly removing isolated points with a filter once before the thinning process. Finally, check the completed point cloud data and, if possible, perform a visual inspection to ensure there are no remaining points of concern. Clean point cloud data positively affects both appearance and analysis results. It is important to eliminate waste not only in terms of file size but also in content, producing data that combines accuracy and compactness.

Enhancing Information Value through Attribute Assignment

One essential aspect of point cloud data organization that must not be overlooked is attribute assignment, in other words the addition of metadata. Attribute assignment means giving each point or group in a point cloud descriptive information (attribute information) other than geometric data. A typical example is in point clouds of buried pipes, where additional information is attached per pipe. For example, for a water pipe you would link attributes such as "Pipe type: ductile cast iron", "Diameter: 150 mm (5.91 in)", "Burial depth: 1.2 m (3.9 ft)", and "Installation date: March 2024" to the point cloud for that pipe. By adding attributes in this way, you can understand detailed information about the pipe just by viewing the point cloud, greatly enhancing the data's practical value.

Attribute information itself is small text and numeric data, so its impact on file size is minor (only a slight increase). In fact, assigning attributes can indirectly contribute to reducing storage in some cases. What this means is that if the necessary information can be retained as numbers or text, you do not necessarily have to rely on all of the dense geometric information in the point cloud. For example, if “pipe diameter” is recorded as an attribute, you do not need to force a diameter measurement from the point cloud. That lowers the need to maintain a high-resolution point cloud to determine the diameter, allowing you to decimate the data more aggressively. In extreme cases—if the requirement is that the point cloud representing the shape of piping can be coarse but the diameter and material should be known numerically—you can omit fine detail in the point cloud to reduce size and instead store accurate values in the attribute fields to achieve your purpose.

Assigning attributes also makes data management easier. For example, when multiple types of pipes (water supply, sewer, gas, etc.) are included in the same point cloud data, assigning a "pipe type" attribute to each point allows you to later filter to "display only water pipes." This functions as a form of classification information, enabling quick access to the required data. Furthermore, if you assign an ID to each pipe and manage them, you can link the point cloud to asset information registered in drawings or GIS. For example, you can add an attribute such as "WaterPipeID12345" to the point cloud and use that ID in a database to manage diameter, material, age, owner, and so on. In this way, point cloud data is elevated from a mere collection of points to part of an infrastructure asset database. For local governments and infrastructure management companies, because information traditionally managed in drawings and registers can be overlaid on 3D point clouds, there is the advantage of handling field conditions and register information as a single set.

As a practical approach to assigning attributes to point clouds of buried pipes, you first need to group the points by pipe in the preceding classification process. If the point cloud is organized for each individual pipe, attributes can be assigned at that unit level. In most cases, based on drawings and construction records, you map each group with a statement such as "this point cloud group corresponds to the XX water utility's YY pipeline" and enter that into an attribute table. Some point-cloud editing software lets you attach and edit attribute tables on the point cloud itself, and with LAS files you can use classification codes or user data fields as custom attributes. In addition, some advanced software can automatically estimate and assign attributes (for example, identifying features from point color or return-intensity patterns and assigning attribute labels like road lane markings or signs). For buried pipes, automatic assignment may be difficult, but at minimum it is worthwhile to manually assign pipe IDs and names. Once these attributes are present, subsequent Excel management and GIS integration become smoother, and the point cloud data alone can function as a living management register.

As a point of caution, it is also important not to attach unnecessary attribute information. Attributes are not better the more there are; select and assign those that match the purpose. Including irrelevant information (for example, the species of nearby trees when the point cloud is of buried pipes) will only make the data more complicated, and some point cloud formats may not support it. Although the direct impact on storage size is small, too many attributes make the data harder for people to handle, so be careful. For buried-pipe point clouds, basic attribute items are pipe type, pipe diameter, burial depth, and year of installation. These items are information already listed in existing buried-asset ledgers, so it is common to copy them over when assigning attributes. Once attribute assignment is completed in this way, the point cloud data moves beyond being merely a drawing record and becomes an advanced digital asset that includes equipment information. This is a different aspect from storage reduction, but it is a process you should adopt as an organizational practice to enhance the final data value.

Optimization of file formats for point cloud data

Finally, by saving the processed point cloud data in an efficient file format, storage can be further reduced and optimized. There are various file formats for point cloud data, but from the perspective of storage size they can be broadly classified into text (ASCII) formats and binary formats. Text formats record point coordinates and attribute values as human-readable strings; common examples include CSV, TXT, PTS, and PLY (ASCII mode). While text formats are highly versatile, they tend to result in very large file sizes. For example, storing the same point cloud as a text CSV can be several times larger than in a binary format. This is because when numbers are written out as strings, extra characters such as digit characters, separators, and line breaks increase the amount of data. In practice, huge CSV point cloud files are often too large to open in a text editor, and the meaning of each column may be unclear, making reuse difficult.

On the other hand, binary formats record numerical values compactly in binary, so they are highly storage‑efficient. A typical example is the LAS format (LASer file format). LAS is a binary point-cloud format that is standard in the surveying industry and can efficiently store coordinates, color, classifications, and so on. When the same point cloud is saved as CSV and as LAS, LAS is often several times smaller and loads faster. Therefore, if you are handling point-cloud data in a text format, I recommend converting it to a binary format without hesitation — that alone will reduce storage requirements and shorten processing times. Many software packages can export to LAS or binary PLY, and free conversion tools are also available.

Furthermore, if you use a binary format that includes compression functionality, you can make files even lighter. A representative example is the LAZ format. LAZ is a format that compresses the aforementioned LAS files with a dedicated algorithm—essentially a "ZIP file for point clouds." Applying LAZ compression can reduce file size to roughly 7–20% of the original LAS (compression ratios vary depending on the data, but it is often about one-tenth). For example, a 500 MB LAS file may become about 50 MB as LAZ. Because LAZ is lossless compression, you can always convert back to LAS when needed and retrieve the original point cloud data without any degradation. Today, LAZ has become the de facto standard for distributing large point cloud datasets, and even Japan's public survey data (such as the Geospatial Information Authority of Japan's airborne LiDAR data) are provided in LAZ format. Therefore, it's a good idea to store and share buried-pipe point clouds used within your company as LAZ. File sizes shrink to around one-tenth, making email attachments and cloud transfers far easier.

Converting to the LAZ format is not very difficult. Using open-source compression tools (such as LASzip) you can compress .las to .laz with a single command, and even if you are not familiar with them you may be able to create LAZ simply by loading and re-saving the file in compatible software. Compression times are short and the tools are publicly available for free, so there is no cost. The important point is to establish rules for using LAZ within your operational workflow. For example, if you adopt practices like "deliverables are provided as LAZ" and "internal storage is LAZ by default, reverting to LAS only when editing is necessary," only compressed data will circulate and storage consumption can be minimized. Because LAZ can be difficult to edit as-is, a flexible approach—such as keeping a master LAS while distributing LAZ—is also recommended. In any case, selecting and optimizing file formats is the final major measure for reducing storage size.

In relation to optimizing file formats, splitting data is also an effective measure. Instead of cramming an entire area and all information into a single massive point cloud file, dividing files by area or by content lets you work with only the parts you need. For buried pipes, for example, if the area is large you might split the point cloud into tiles of 50 m (164.0 ft) square, or manage files separated by piping systems. By splitting, you only need to load the files for the parts you want to open, reducing memory load and processing wait times, and making data management easier when working in teams. However, if you split too much the number of files will increase and management becomes cumbersome, so the key is to formalize rules for appropriate partitioning. Examples of splitting methods include "separating by existing drawing partition units" and "separating by pipe types in a layer-like manner," and these should be decided to suit the nature of the organization or project.

As described above, by converting to appropriate formats and, where necessary, incorporating splitting, the processed point cloud data becomes even lighter and easier to handle. By progressively optimizing—text to binary, and binary to compressed formats—the end product should be substantially slimmer compared to the original raw data.

Summary

As a method for efficiently handling point cloud data of buried pipes, I explained it from the perspectives of classification, thinning, noise removal, attribute assignment, and file format optimization. First, by properly classifying the point cloud to extract necessary parts and eliminate unnecessary ones, you can organize the data's skeleton while reducing its size. Next, by applying thinning to lower point density and reduce redundant points while preserving information, you can dramatically compress the data volume. In addition, performing noise removal to delete erroneous points and outliers is effective both for reducing size and improving data quality. By assigning attributes such as pipe type and dimensions to the data, the point cloud itself becomes a valuable information source, making management and utilization easier. Finally, by optimizing file formats (using compressed formats and split storage), the organized data can be stored and shared in the lightest possible form. By implementing these measures comprehensively, you can obtain point cloud data reduced to about one-tenth of the original raw point cloud while properly retaining the information necessary for practical use.

Organized point cloud data of buried pipes can become an extremely valuable asset for design and maintenance work by construction consultants, surveying firms, municipal authorities, and infrastructure operators. If streamlined, it can be handled easily even on a typical personal computer, making data sharing with other departments and long-term archiving simpler. Point clouds in which piping location information is secured with high precision also provide a foundation for future applications such as projecting onto the site with AR technologies or integrating with other geospatial data to build digital twins of urban infrastructure. The key in such cases is the accuracy of positioning (registration). Even if the point cloud is well organized, inaccurate base coordinates will hinder its usefulness. Therefore, it is important to use high-precision positioning instruments from the field surveying stage. Recently, compact high-precision GNSS receivers that can be attached to a smartphone have appeared, enabling easy acquisition of centimeter-level position coordinates (cm level accuracy, half-inch accuracy). Using an iPhone-mounted high-precision GNSS device like LRTK, it is possible to instantly assign high-precision world coordinates to point clouds acquired by a smartphone’s LiDAR or photogrammetry. By simply walking around the site holding a smartphone, you can measure buried-pipe point clouds without distortion and accurately place their positions on map coordinates. If you then organize and streamline these high-precision point clouds using the methods described above, you will obtain an optimal dataset that combines accuracy and ease of use. By combining positional corrections from high-precision GNSS equipment with the organization techniques described in this article, 3D data of buried pipes will transform from mere records into useful assets for daily operations. Please strive for consistent innovation from field measurement through data organization and utilization to support efficient management of buried pipe point clouds.

Next Steps:
Explore LRTK Products & Workflows

LRTK helps professionals capture absolute coordinates, create georeferenced point clouds, and streamline surveying and construction workflows. Explore the products below, or contact us for a demo, pricing, or implementation support.

LRTK supercharges field accuracy and efficiency

The LRTK series delivers high-precision GNSS positioning for construction, civil engineering, and surveying, enabling significant reductions in work time and major gains in productivity. It makes it easy to handle everything from design surveys and point-cloud scanning to AR, 3D construction, as-built management, and infrastructure inspection.

Survey-Grade 3D Scanning with Absolute Coordinates

Stake Out Faster
with RTK Accuracy

Calculate Cut & Fill Volumes from Design vs Reality

Overlay DXF Plans in AR
with RTK-Stable Alignment

Learn More