Under normal and fault states in practice, rolling bearing vibration data are imbalanced and the fault diagnosis accuracy is low. Based on the deep reinforcement learning, an improved deep Q network (DQN) fault diagnosis method for rolling bearing is proposed. The short time Fourier transform is performed on the vibration data to establish sample sets of time-frequency graph. The distance between the sample and the center point in the K-means algorithm is used as the bias of the return value. The imbalance ratio is utilized as the benchmark to formulate a personalized reward function for the training set. Meanwhile, the residual network (Resnet-18) is used to realize the deep extraction of features. In which, the agent takes the new reward function and time-frequency graph as input. The diagnosis action is executed at each time step. And the reward is judged and returned. Finally, the agent learns the fault diagnosis strategy under imbalanced data. Compared with other methods, experimental results show that the improved diagnostic model is improved by 5% to 8% under imbalanced conditions. At the same time, it also performs outstandingly under imbalanced and variable load conditions. The imbalanced index score can reach about 0. 982, which shows better generalization.