Evaluating AI Models Under New Standards