基于CLIP的无监督大尺寸DR图像增强算法研究
DOI:
CSTR:
作者:
作者单位:

1.中国航发哈尔滨东安发动机有限公司哈尔滨150066; 2.重庆大学ICT研究中心重庆400044; 3.重庆大学光电工程学院重庆400044

作者简介:

通讯作者:

中图分类号:

TH878.1

基金项目:

国家重点研发计划(2022YFF0706400)资助


Research on unsupervised large-size DR images enhancement algorithm based on CLIP
Author:
Affiliation:

1.AECC Harbin Dongan Engine Co, Ltd, Harbin 150066, China; 2.Industrial Computed Tomography Research Center, Chongqing University, Chongqing 400044, China; 3.College of Opticelectronic Engineering, Chongqing University, Chongqing 400044, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    X射线数字成像技术(DR)已广泛应用于工业无损检测领域。然而在实际应用中,存在大量的结构不规则、厚度变化范围大的工件。DR检测容易在工件厚度厚的地方呈现曝光不足,在工件厚度薄的地方出现曝光过度的现象,导致DR扫描图像质量差,结构信息严重缺失。伴随探测器像素矩阵增至4 K×4 K以上,大多数算法难以在消费级设备上处理DR扫描产生的大尺寸图像。同时工业检测场景难以获取大量的成对配对标签。针对大尺寸DR推理与标签稀缺的问题,将对比语言-图像预训练(CLIP)与限制对比度自适应直方图均衡化(CLAHE)先验耦合,构建两阶段轻量级无监督增强框架,全程无需配对或分块操作。该算法在第1阶段初步学习提示向量引导冻结的CLIP图像编码器,通过CLIP增强损失、结构一致性损失,以及CLAHE特征图感知损失指导训练;第2阶段通过排序损失迭代对提示权值迭代精炼,交替更新增强网络直至视觉收敛。实验结果表明,与同期的无监督算法相比,峰值信噪比(PSNR)、感知图像相似度(LPIPS)、结构相似度(SSIM)等指标分别提高1.0 dB、1.6%和2.0%,在多个参考指标优于对比方法。在推理时只需加载0.279 M的参数,处理5 732×2 333的大尺寸图像单张耗时约1.5 s。仅用380张铸件图像训练的模型即可直接在未训练过的碳纤维线路板以及其他材料的物品上迁移,并展现出良好的泛化能力,为工业检测落地提供实时增强方案。

    Abstract:

    X-ray digital radiography (DR) has been widely used in the industrial nondestructive testing. However, there will be a large number of workpieces with irregular structures and large thickness variations in the practical applications, which cause the DR detection prone to the underexposure in thick parts and overexposure in thin parts. On one hand, for the detector pixel array of 4 K×4 K, most algorithms can hardly handle these large DR scan images with the customer-level devices. On the other hand it is difficult to obtain a large number of paired labels for the industrial inspection. To address the problem of large-size DR inference and label scarcity, a lightweight unsupervised enhancement framework is proposed by coupling the contrastive language-image pretraining(CLIP) vision-language model with contrast-limited adaptive histogram equalization (CLAHE) priors. The first stage learns prompt vectors to guide a frozen CLIP image encoder with the CLIP enhancement loss, structural consistency loss, and CLAHE feature-map perception loss. The second stage refines the prompts iteratively through the ranking loss and alternately updates the enhancement network until the visual convergence. Experimental results show that peak signal-to-noise ratio(PSNR), learned perceptual image patch similarity(LPIPS), and structural similarity(SSIM) are improved by 1.0 dB, 1.6%, and 2.0%, respectively, outperforming other unsupervised algorithms on multiple metrics. Additionally the inference needs only 0.279 M parameters and processes a 5 732×2 333 image in 1.5 s. Furthermore the model trained with merely 380 casting images generalizes directly to unseen carbon-fiber circuit boards and other materials, demonstrating the strong potential for industrial deployments.

    参考文献
    相似文献
    引证文献
引用本文

陈明飞,廖望,王广文,吴义顺,沈宽.基于CLIP的无监督大尺寸DR图像增强算法研究[J].仪器仪表学报,2026,47(2):309-321

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-04-08
  • 出版日期:
文章二维码