Background: When teams follow a software development process they do not follow the process consistently. We need a method to measure their fidelity to that process. Objective: To evaluate Rozinat and Aalst’s metrics for process conformance to a state based model on noisy data (Rozinat and van der Aalst, 2008). Method: We instructed 14 teams that were developing a software system using Extreme Programming (XP) to record the events of their project (for example writing code, or testing). We calculated the values of the proposed metrics by comparing the data collected to a process model of XP. Results: 13 teams recoded data that we treat as a multiple case study. The fitness metric gave varying results over the teams that corresponded to the number of event types used in the correct order. The appropriateness metrics measured the same values for all teams. Conclusion: The fitness metric is useful for measuring fidelity, but the appropriateness metrics do not measure over fitting well with noisy data. In addition neither metric gave useful information about other aspects like iteration.