we construct the fine-grained landmark dataset for real urban street scenes. and utilize this dataset for fine-tuning. Specifically, based on the GSV dataset\cite{Ali_bey_2022} from Google Street View, we obtain image data with landmark bounding boxes by having annotators outline common landmarks in the urban street view images, while also recording the types of landmarks.