Limitations of refinement methods for weak to strong generalization