Experts find flaws in hundreds of tests that check AI safety and effectiveness