Embedded Vision Solution Increases Construction Site Safety
Work Safely with AI-Powered Real-Time Helmet Recognition
When discussing construction, one has to recognize that it is one of the most dangerous industries in the US. Through the years, Personal Protective Equipment (PPE) has made its way into mandatory requirements of construction sites due to its importance to workers’ safety. PPE may include safety glasses, earplugs, gloves, or helmets. Manufacturing technology transformation involves integrating PPE via the Internet of Things (IOT) to understand better how the equipment is used and even act when necessary, such as alerting when a worker enters a restricted zone to prevent potential dangers.
Not only does IOT apply to this field, but AI also is a great match. Given the advancements in deep learning algorithms and the immense amount of data created daily, AI techniques have greatly expanded to diverse tasks and environments, and countless industries have adopted these technologies. The field of computer vision has made enormous progress in recent years, developing great solutions for scene understanding. Construction sites are a great place to embrace them. For instance, this technology can contribute by giving insights into PPE compliance. PPE is essential for workers, so controlling its adoption is critical for minimizing risks to workers’ health and employer’s responsibility.
Tryolabs partnered with Seeed, a hardware innovation platform that works closely with technology providers of all scales providing quality and affordable hardware. For example, they offer various Nvidia products on their Jetson Platform. The goal was to leverage Seeed’s hardware, mainly using their Recomputer edge devices built with Jetson Xavier NX 8GB module and develop a computer vision analytics solution that tackles a challenging task in the technology transformation field. More specifically, they picked the challenge of detecting safety helmets in real-time.
Controlling the Usage of Safety Helmets is Very Expensive
Specifically talking about the construction industry, PPE, such as safety helmets, helps prevent and minimize injuries on construction sites and in factories. Companies want to avoid all types of injuries, especially head injuries. These – sometimes fatal – injuries can lead to incurable, longterm health complications, such as memory loss, broken bones, and spinal damage. In 2012, more than 65,000 cases involving days away from work were a result of head injuries in the workplace. 1,020 workers died that same year from head injuries occurring on the job. Therefore, the use of equipment like safety helmets is mandatory in most of the world.
Many workers, however, do not recognize how truly dangerous not wearing the proper gear is. Whatever their reason may be, whether it is the environment that is too hot or too cold or that the helmet is uncomfortable, workers tend to take off their helmets despite the risk of getting hurt.
Unfortunately, shortcomings exist in continuously monitoring its use, and companies struggle with ensuring their employees follow safety rules. In most scenarios, this process is manual, making it very expensive and inefficient.
Usually, reactive actions need to be taken when manually controlling the usage of safety helmets. From a business perspective, significant cost reductions would be possible if proactive actions are available for the people in charge of monitoring and controlling the construction sites. Current technology plants the seed of curiosity to seek and explore better alternatives given our existing capabilities and resources. That’s where the new solution comes in.
How Could AI be Leveraged to Improve the Business Process?
After understanding the problem behind the use case, Tryolabs designed and implemented the technology platform to deliver value and comply with all the business needs. The proposed solution involves a robust system in production using embedded hardware, optimized computer vision software, and data analytics tools. Combining these technologies enables the business to fully automate the process of monitoring helmet usage by having real-time statistics of the activity on the construction site.
By partnering with technology providers from hardware to the cloud, Seeed offers a wide array of hardware platforms and sensor modules ready to be integrated with existing IOT platforms. The proposed plan consists of creating an end-to-end solution to monitor the use of safety helmets in real-time and deploying it to a Jetson Xavier NX module provided by Seeed. This way, Tryolabs provides computer vision software, and Seeed provides all the edge devices needed to deploy the solution.
Solution: a Real-time Video Analytics Platform
From a technical perspective, it is essential to understand and communicate the decisions while building an end-to-end solution. Here are the details on hardware, software, and data involved in the project to give a good sense of what was needed to build a real-time video analytics platform.
he Jetson Xavier NX is a small but powerful module suited for AI applications in embedded and edge devices. It is equipped with a 384-core Nvidia Volta GPU, a 6-core Carmel ARM CPU, and two Nvidia Deep Learning Accelerators (NVDLA). It can attain an AI performance of 21 TOPS with a power consumption of 20 W (or 14 TOPS in a low-power mode with a power consumption of as little as 10 W). These specifications, combined with the 8 GB LPDDR4x memory with over 59.7 GB/s of bandwidth, make this module a suitable platform for running AI networks with accelerated libraries for deep learning and computer vision.
An AI Model was Trained to Continuously Monitor the Use of Safety Helmets
Yolov5 is one of the most used algorithms for object detection. It is capable of computing accurate detections and running fast, allowing its users to create real-time object detection applications. Since its beginning in 2016 with Joseph Redmond’s publication, the Yolo algorithm has been famous for its performance. Its small size paved the way for mobile devices. The weights of a trained Yolov5 model are notably smaller than Yolov4’s, making it easier to deploy Yolov5 models to embedded devices. Yolov5 is approximately 88 percent smaller in size than Yolov4. When running Yolov5 on an Nvidia Tesla P100 GPU, it can detect objects at 140 FPS, compared to its predecessor’s max. capability of 50 FPS.
A Yolov5 Medium architecture was trained to continuously monitor the use of safety helmets on construction sites and factories. The detector can locate the faces of the people on a frame and classify them into the categories of “helmet” and “no helmet.” Given a specific person on a video, this category should be highly correlated through consecutive video frames. Tryolabs’ open-source tracking library Norfair enables to get more robust and less noisy criteria for this classification. By leveraging video tracking, we implemented a system of votes using the label associated with several consecutive detections to more confidently decide if a person is wearing a helmet or not. Therefore, evidence for several frames is needed to classify each person. A single misclassified detection is not enough to change the category in which a person is placed.
The Model can Learn Patterns that Generalize well
Quintillions of bytes of data are created daily, and AI models are taking advantage of this. The number of images uploaded to the internet daily has made possible the existence of public datasets for a wide variety of applications. Of course, having access to these images is not the only requirement for creating a dataset; when working with supervised learning, it also takes time and human effort to label each image with the correct annotations so that our computers can recognize the patterns we need them to learn. To make a detector that works well for the different environments, the images of this dataset must be taken from many diverse locations and under differing lighting conditions. In turn, our model can learn patterns that generalize well, unlike characteristics unique to a particular scene.
Fortunately, public datasets are already available to distinguish faces with and without helmets, such as the GDUT-Hardhat Wearing Detection dataset we selected for this project. This dataset includes 3,869 images, from which a subset of 2,916 images was selected for the training set, another 635 images were chosen for validation, and the remaining 318 images were set apart for testing purposes.
The results: Yolov5 vs. Faster R-CNN
We compared the Yolov5 and the Faster R-CNN architectures on a training job using this dataset, with most default settings. The training consisted of 26 epochs using multiprocessing and two Nvidia GeForce RTX 2080 Ti GPUs. Yolov5 vastly outperformed Faster R-CNN, obtaining better metrics in a much shorter time. In terms of inference time, both models performed similarly, taking around 0.08 seconds for each image on the edge device (12.5 FPS).
Conclusions
Monitoring helmet usage in different scenarios leads to valuable insights into taking preventive actions and saving time and resources. Tryolabs have discovered how to monitor the usage of safety helmets in various environments from an edge device using state-of-the-art detectors, creating a more efficient and affordable alternative than the more normal and rough manual process.
Author
Nicolás Eiris
Lead Machine Learning Engineer