AN ITEM ANALYSIS OF ENGLISH SUMMATIVE TEST IN EFL CLASSROOM (A case study at Elementary School in Indonesia)

This research was conducted to find out whether the test items are good or weak and need to be revised and the possible cause of why the test items are weak, and to propose the alternative of the revision for the weak items of summative test of second semester of first grade students of SMP Negeri 1 Maiwa for academic year 2018/2019.This research employed quantitative and qualitative method that. The population in this research will be the first grade student of SMP Negeri 1 Maiwa it consist of two classes and the total population are 50 students where automotive Department consists 20 students and Machine-technique department 20 students. The instrument used in this research was the researcher utilizes semi-structured interviews and recorded in qualitative method. The result of this research showed that, for the multiple choice which was consist of 10 items; the index of difficult has the result analysis on average 81.43% of seven items in very easy level, on average 68.75% of two item in medium level and on average 22.50% of one item in difficult level. The discriminating power has the result analysis on average 28.18%. The effectiveness of distracter has the result analysis on average with four choices of answers (A, B, C & D) sequential item number 1 up to 10 namely: A (sangat buruk, baik, buruk, sangat buruk, kunci jawaban, baik, buruk, kunci jawaban, kunci jawaban, kunci jawaban), B (buruk, kunci jawaban, kunci jawaban, kunci jawaban, buruk, baik, kunci jawaban, buruk, sangat buruk, sangat buruk),C (kunci jawaban, buruk, buruk, buruk, kurang baik, kunci jawaban, baik, buruk, buruk, buruk), D (buruk baik, buruk, buruk, baik, kurang baik, buruk, buruk, buruk, buruk). For the essay test which was consist of 15 items; the index of difficult has the result analysis on average 85.91% of five items in very easy level, on average 78.64% of five items in easy level and on average 54.00% of five items in medium level. The discriminating power has the result analysis on average 29.37%.


INTRODUCTION
Item analyzes are a process that examines the students' responses to each test item(s) to evaluate the quality of these objects and the whole test. In particular, the item analysis is important when improving items, which are used once again in subsequent tests, but can also be used in a single test administration to eliminate ambiguous and misleading items. Furthermore, item analysis is important for improving teachers' skills in test building and for identifying specific areas of content that need to be emphasized or clarified. "Evaluation drives the curriculum" is widely believed. It can therefore be argued that assessment is the obvious starting point if the quality of teaching, learning and learning is to be improved. The upgrade evaluation, however, is a continual process. The period of preparation and building of evaluation methods and research, validation and evaluation must be continually continued. If assessments are designed to teach, measure the results of education programs or for the purposes of education analysis, the conduct of items and test analyzes is very critical. The quality of the products and the test as a whole are evaluated. These analyzes can also be used to review and improve both items and the testing in its entirety. QA is a methodology that helps us to determine the quality and usefulness of a commodity. This is done by identifying distractors or underperforming response options. Procedures for object analysis are designed to improve the reliability of the test. Given how the relationship between the different elements and the entire test is maximized, it is important to ensure that the overall test measures what it must measure. It is important to maximize test reliability This does not occur, since the overall score is a poor assessment criterion for each item. In teaching and learning process, the teachers usually give a test to measure the students' achievement levels and to know the success of English teaching program. It also shows the weakness, the progress, or the students' way of learning. Suharsismi Arikunto (1999: 53) stated Tes adalah alat prosedur yang digunakan untuk mengetahui atau mengukur sesuatu dalam suasana dengan cara atau aturan yang yang telah ditentukan (test is a procedure instrument that is used to know or measure something in a situation in determined method or rule). While Brown Douglas H (1970: 384) stated a test in plain words is a method of measuring a person's ability or knowledge in a given domain. Based on statements above, a test is an important factor in teaching and learning process. One kind of tests is the summative test. It is held at the end of term or semester for assigning grades. Gronlund Norman E. (1971:105) the summative test is given at the end of a course or a unit of instruction and the result are used assigning grades, or for certifying pupil mastery of the instructional objectives. Means while Daryanto (2001:05) says Tes sumatif biasanya diberikan pada akhir suatu jenjang pendidikan, meskipun maknanya telah diperluas untuk dipakai pada tes akhir cawu atau semester bahkan pada akhir pokok bahasan (summative test is given in the end of an education level, even though, the meaning of it has been spread to be used in the last test of quarterly or semester, even in the end of a session). In order to know whether the English summative test in SMP Negeri 1 Maiwa at first grades students of second semester for academic year 2018/2019is good or not, the writer makes the item analysis. The writer interested to analyze the items based on difficulty level, discriminating power, and the effectiveness of the distracters of each item.

Population and Sample
The population in this research would be the first grade student of SMP Negeri 1 Maiwa it consisted of two classes and the total population are 50 students where automotive Department consists 20 students and Machine-technique department 20 students. The technique of sampling would be used in this research is total sampling. Two classes will be taken as the respondents. The students of each consisted of 20 and 20 students. It means the total respondents are 50 students.

Instrument of the Research
To collect the data, the researcher utilized descriptive statistic would be applied in quantitative method through formula to assess the index of difficult and the discriminating power of the test

Procedures of Collecting Data
To check whether a test is reliable we can determine the Discrimination Index. Statistical programs can calculate to what extent a test item differentiates between advanced students and beginners. Split-half reliability is a quantitative method that allows us to determine the internal consistency of a test. Item analysis is a systematic procedure to get information about the test. To get those data above, the research tried to use a program that was very useful in tabulating the data. That was ANATES V4.

RESULTS
The result of multiple choice tests a. The analysis of the index of difficult, the discriminating power and The Effectiveness of Distracters.
In this analysis, the researcher used a statistic program "AnatesV4" to assess the index of difficult and the discriminating power. Through this way, the researcher got the data of the summative test of second semester of the first grade students of SMP Negeri 1. The first phase shows that the test is "sangat mudah" because: a. The index of difficult is only 90.00% for item no.1 and the total correct answer 36 since the total population is 40 students. b. The index of difficult is only 95.00% for item no.2 and the total correct answer 38 since the total population is 40 students. c. The index of difficult is only 95.00% for item no.3 and the total correct answer 38 since the total population is 40 students. d. The index of difficult is only 95.00% for item no.4 and the total correct answer 38 since the total population is 40 students. e. The index of difficult is only 95.00% for item no.7 and the total correct answer 38 since the total population is 40 students. f. The index of difficult is only 97.50% for item no.8 and the total correct answer 39 since the total population is 40 students. g. The index of difficult is only 97.50% for item no.9 and the total correct answer 39 since the total population is 40 students. 2. The second phase shows that the test is "sedang" because: a. From the total population of 40 students, there are 27 students correct in answering the test number 5; where The index of difficult 67.50% b. From the total population of 40 students, there are 28 students correct in answering the test number 10; where The index of difficult 70.00% 3. The third phase shows that the test is "sukar" because: the index of difficult is only 22.50% for item no.6 and the total correct answer 9 since the total population is 40 students.

CONCLUSION AND SUGGESTION
A. Conclusion Based on the findings and the discussion presented in the previous chapter, the researcher concluded that focus on analyzing the items based on difficulty level, discriminating power, and the effectiveness of the distracters of multiple choice and essay test in summative test. It is proved by analyzing of those criteria above that the test which were provided and served by the teacher of English there is still far from the standardization of good test. . It is supported by the data, in which the result of this research showed that as follows as: 0.00% for the item number 2; 9.09% for the item number 3; 9.09% for the item number 4; 90.91% for the item number 5; 54.55% for the item number 6; 9.09% for the item number 7; 9.09% for the item number 8; 9.09% for the item number 9; 54.55% for the item number 10. And the discriminating power of essay test were 9.09% for the item number 1; 31.82% for the item number 2; 9.09% for the item number 3; 22.73% for the item number 4; -4.55% for the item number 5; 22.73% for the item number 6; 9.09% for the item number 7; 13.64% for the item number 8; 9.09% for the item number 9; -4.05% for the item number 10; 47.27% for the item number 11; 76.36% for the item number 12; 72.73% for the item number 13; 72.73% for the item number 14; 52.73% for the item number 15. 3. The effectiveness of distracter of the multiple choices for the item number 1 was option A was "very poor"; option B was "buruk"; option C was "correct answer"; option D was "buruk". Item number 2 was option A was "baik"; option B was "correct answer"; option C was "buruk"; option D was "baik". Item number 3 was option A was "buruk"; option B was "correct answer"; option C was "buruk"; option D was "buruk". Item number 4 was option A was "sangat buruk"; option B was "correct answer"; option C was "buruk"; option D was "buruk". Item number 5 was option A was "correct answer"; option B was "buruk"; option C was "kurang baik"; option D was "baik". Item number 6 was option A was "baik"; option B was "baik"; option C was "correct answer"; option D was "kurang baik". Item number 7 was option A was "buruk"; option B was "correct answer"; option C was "baik"; option D was "buruk". Item number 8 was option A was "correct answer"; option B was "buruk"; option C was "buruk"; option D was "buruk". Item number 9 was option A was "correct answer"; option B was "sangat buruk"; option C was "buruk"; option D was "buruk". Item number 10 was option A was "correct answer"; option B was "sangat buruk"; option C was "buruk"; option D was "buruk". B. Suggestion Based on the conclusion presented in the previous sub-chapter, the researcher tried to give some suggestions for the future research about the English teaching method as follows: A teacher / creator should always do the revision / evaluation to the test which will and have been made with the development objectives and the quality of tests can be categorized presented a good test. In another sense, the purpose of improving the quality of students and the professionalism of educators can be realized.