Abstract:X-ray digital radiography (DR) has been widely used in the industrial nondestructive testing. However, there will be a large number of workpieces with irregular structures and large thickness variations in the practical applications, which cause the DR detection prone to the underexposure in thick parts and overexposure in thin parts. On one hand, for the detector pixel array of 4 K×4 K, most algorithms can hardly handle these large DR scan images with the customer-level devices. On the other hand it is difficult to obtain a large number of paired labels for the industrial inspection. To address the problem of large-size DR inference and label scarcity, a lightweight unsupervised enhancement framework is proposed by coupling the contrastive language-image pretraining(CLIP) vision-language model with contrast-limited adaptive histogram equalization (CLAHE) priors. The first stage learns prompt vectors to guide a frozen CLIP image encoder with the CLIP enhancement loss, structural consistency loss, and CLAHE feature-map perception loss. The second stage refines the prompts iteratively through the ranking loss and alternately updates the enhancement network until the visual convergence. Experimental results show that peak signal-to-noise ratio(PSNR), learned perceptual image patch similarity(LPIPS), and structural similarity(SSIM) are improved by 1.0 dB, 1.6%, and 2.0%, respectively, outperforming other unsupervised algorithms on multiple metrics. Additionally the inference needs only 0.279 M parameters and processes a 5 732×2 333 image in 1.5 s. Furthermore the model trained with merely 380 casting images generalizes directly to unseen carbon-fiber circuit boards and other materials, demonstrating the strong potential for industrial deployments.