Field Testing

As part of the continued development of the IPT testing system, the test items created during Ballard & Tighe’s item writing workshops underwent nationwide field testing to ensure validity and reliability.

Test Item Preparation
The thousands of items created for the IPT testing system were reviewed by testing experts for technical appropriateness, by current classroom teachers across the United States for grade-level appropriateness, and also by a select panel of educators who screened the items for bias and cultural/language sensitivity. Based on the recommendations and feedback from these stringent reviews, the items were refined further in preparation for field testing.

Field Testing Process
Field testing in April, May and June 2004 gathered valuable information about how the specially created test items perform with real students. Thousands of English language learners at elementary, middle and high schools in 15 states across the country participated in field testing numerous items designed to evaluate Listening, Speaking, Reading, and Writing skills in the following four IPT levels: Level 2 (Grades 1-2); Level 3 (Grades 3-5); Level 4 (Grades 6-8), and Level 5 (Grades 9-12). Specifications and items for the Level 1 (Pre-K/K) tests are currently in development, with field testing to follow.

Horizontal & Vertical Scaling
NCLB requires that English language proficiency assessment show incremental growth to determine whether students have met their AMAOs. The IPT is designed to measure the developmental path of English language abilities, ranging from zero ability, through all the intermediate stages, and up to grade-level English language ability. Thus, the IPT tests are both horizontally and vertically scaled to provide one standardized scale to show progress for students from grades Pre-K to Grade 12.

  • To facilitate horizontal scaling, students participating in the field tests took items in at least two of four tests, e.g., reading and writing, reading and speaking, writing and speaking, and so forth. After analysis, the reading, writing, listening, and speaking scores were then expressed on the same scale.
  • To facilitate vertical scaling, participants in these type of field tests took items that were written specifically for students in their grade span as well as additional items that were written for students in the grade span just below them. As a result, the scores from all the different grade span tests can be expressed on the same scale, making it possible to report annual progress even when a student takes one test (e.g., Level 3) one year and another test (e.g., Level 4) the next year.

Field Test Analysis
Field testing provided crucial information about how the actual items function in an authentic test setting. Ballard & Tighe’s assessment team analyzed the field test results during the summer of 2004 and went on to construct the first IPT operational test forms and support materials in preparation for the final phase of the project — pilot testing. With the successful completion of pilot testing, the refined IPT test forms were made available by Summer 2005.