Patterns of Evolution in the Practice of Distributed Software Development: Quantitative Results from a Systematic Review

Distributed Software Development is recent as a research area. With the evolution of its practice, more questions have emerged, and more research has been conducted. Consequently, this resulted in an increase in the existent literature. At the same time, the diversity of industry experience in the last ten years has been used to develop successful practices. We lack, however, knowledge of patterns of evolution in the practice of distributed software development that have been identified and proposed in the literature. In this paper, we present findings from the quantitative analysis of a systematic review of the literature of distributed software development. The goal of the review was to identify papers that either describe existing models referring to patterns of evolution in the practice of distributed software development, or discuss the need for such models.


INTRODUCTION
As globalization efforts currently pervade our society, software project teams have become geographically distributed (Aspray et al, 2006;Boehm, 2006), a phenomenon that has been referred to as Distributed Software Development (DSD).When the distance becomes global however, the companies experience what has been referred to as Global Software Development (GSD).In Software Engineering, DSD has grown dramatically in the last decade, as many companies are distributing their software development facilities, looking for competitive advantages in terms of cost, quality, and qualified professionals (Robinson & Kalakota, 2004).According to Carmel & Tija (2005), the DSD phenomenon started in the early 90's, but only during the last ten years it has been recognized as a powerful competitive strategy.No matter whether DSD involves local or global partners, or whether it is within the same company or in a third-party relationship, organizations are facing several and important challenges from a SE perspective (Meyer, 2006).Therefore, the software practitioners interested in embarking on a DSD journey could benefit from knowledge of how current DSD practices have evolved in organizations that are already involved in DSD, Most of the existing literature on DSD evolution tackles strategic aspects of the phenomenon, such as establishing distributed software development centers (e.g.Carmel & Agarwal, 2002;Höfner & Mani, 2007), project allocation decisions (e.g.Ebert, 2007), and client-vendor relationship (e.g.Mirani, 2006), There is a need, however, to consider the technical aspects of the DSD practice in organizations (Sengupta et al, 2006;Meyer, 2006;Ramasubbu et al, 2005).A review of literature that systematically surveys papers that discuss or propose models of DSD evolution, from a business as well as technical perspectives, and the details of these models is needed.
In this paper we report from a systematic review of the DSD literature, and where we searched for papers that either describe existent models describing patterns of evolution in the practice of DSD, or discuss the need for such models.Due to page limit, in this paper we discuss the identification and categorization of papers, as well as insights from a quantitative analysis of our findings.
The paper is organized as following: in Section 2 we present the concepts involved in the identification of patterns of DSD evolution.In Section 3 we present the taxonomy used in this systematic review, while in Section 4 the protocol for the systematic review is presented.In Section 5 we present the analysis of the results, while Section 6 discusses our findings and the limitations of our review.

PATTERNS OF EVOLUTION
Patterns of evolution, in our study, mean a set of standard steps (or stages) that were successfully followed in the past by individuals, project teams, or organizations, and were documented and shared to being followed by other peers as a successful practice.Carmel (2005) defines stage models as powerful frameworks in understanding a phenomenon, because they capture evolution and growth, and also reflect learning curves and diffusion.Carmel argues that such models are useful for both research and practice: practitioners can use stage models to understand where they are, where is the competition, and what they can do to evolve.On the other hand, researchers can not only identify and propose the patterns, but also use them to better understand the behaviors behind a given phenomenon.Such evolution patterns (or stages) can also be defined as maturity and capabilities levels in an evolution model.Chrissis et al (2006) define capability as the predictability of the process and its outcomes, or the range of expected results that can be achieved by following a process.The authors define maturity as the growth in the process capability, a well-defined evolutionary path toward achieving a mature process, where each maturity level provides a layer in the foundation for continuous process improvement.Achieving each level of a maturity framework means an increasing in the process capability.
Despite the utility of such models, however, they have always been easy target for criticism, as stated by Carmel (2005).Some critics indicate that they are heuristically developed, usually not validated; that they are incomplete, and assume a linear evolution through each stage.While this criticism is valid, Carmel also states that, the collective understanding of a phenomenon would, however, be poorer if these patterns are not identified.Furthermore, these models seem to be more useful at early stages of the phenomenon.Once the phenomenon is mature, the interest is not so evident.
The use of evolution patterns or stage models is not new in Computer Science.They are also very common and can be found in the Social Sciences, where Tuckman proposed a well-known model (1965).Tuckman developed a model to describe the stages (or sequences) of group development.In Computer Science, within the Information Systems domain, one of the first stage models was proposed by Nolan (1973), with the purpose of analying the evolution of managing the computer resource.In SE, it is possible to find influence of Nolan's thoughts on the development of models such as the SW-CMM and CMMI (Chrissis et al, 2006), among many others.During the development of his work, Nolan (1973) also argues that stage theories have proved to be useful for developing knowledge in diverse fields during their formative periods, which is exactly the case of Distributed Software Development.

TAXONOMY USED IN THIS SYSTEMATIC REVIEW
The DSD literature is heterogeneous, where often studies use different terms for same concepts.One example is the use of "distributed software development" and "global software engineering" interchangeably, when DSD refers to the development that includes geographically distributed teams but not necessarily globally distributed (as in case of global software development).Similarly, the term "offshore outsourcing" is often used interchangeably with "offshoring", which is not necessarily true, as described in Prikladnicki et al (2007).This diversity in terminology relies on the fact that this area is still in the early stages, and the terms are still being defined and standardized.To avoid problems of lack of standardization, we defined a common taxonomy to be used in our systematic review.This taxonomy was based on previous studies (Kumar & Willcocks, 1996;Robinson & Kalakota, 2004;Carmel & Tija, 2005;Sakthivel, 2007;Prikladnicki et al, 2007;Szymanski & Prikladnicki, 2007), and guided us throughout the entire review process.Its two dimensions are (1) DSD business model and (2) scope of the study (or management level) and they are described in the next two subsections.

Business models for Distributed Software Development
As described in Kumar & Willcocks (1996), DSD options are classified based on the geographic location of the personnel and the relationship of the organizations involved.Later, Robinson and Kalakota (2004) refer to these options as DSD business models, and a detailed description of four of them are summarized in Prikladnicki et al (2007).Two were considered for this systematic review: -Offshore Outsourcing model, which involves a relationship with an external company (outsourcing) for software development.This external vendor is not located in the client's country (offshore); -Internal Offshoring model, where a company creates its own software development center (subsidiary) to supply the internal software demand (insourcing).This subsidiary is located in a different country than the company's headquarters.
Because of the lack of standardization in terms describing the relationship between the outsourcing organization and the vendor, we were aware that some papers could explore any of these two business models, and not clearly define the type of distribution, including details of this relationship.In this case, we considered the term offshoring to refer to distribution that was global, although it may not always be clear whether the paper considers internal offshoring or offshore outsourcing.All other papers were classified as "others".

Scope of the systematic review
Developing software in a distributed customer-vendor relationship involves a number of business and technical decisions.When embarking on a DSD journey, the outsourcing companies (referenced in this study as client or headquarters) have to make very important business decisions such as the number of distributed sites (referenced in this study as vendors or subsidiaries), geographical location of the distributed sites, and organizational structure.
Once established, other equally important decisions of technical nature relate to the operational environment at the distributed sites, such as project structure, development process, project management, architectural strategies for each project or portfolio, and project modularity.
In an offshore outsourcing model, the vendor might have more autonomy regarding technical decisions, while in internal offshoring approaches technical decisions are often made jointly with the headquarter.In our review we classified each paper regarding the scope, that is, business or technical.Those papers exploring both business and technical levels were classified accordingly.

SYSTEMATIC REVIEW PROTOCOL
The main purpose on the systematic review was to find evidence regarding models that describe patterns of evolution in the practice of DSD.We followed the recommendations provided by Kitchenham (2004), and other experiences documented in both the SE and IS literature (Brereton et al, 2007;Neto et al, 2007;Dyba et al, 2007;Dibbern et al, 2004).Our review protocol was based on the one used in Neto et al (2007), and the research question that guided the systematic review was: What evolution patterns for DSD (also capability, maturity, and stage models) or descriptions of the need for such patterns have been published, and what are the details of each paper?
The keywords were defined based on two main categories of terms: those related to DSD, and those related to evolution of DSD practice.Table 1 outlines the keywords used in the search.The search was a combination of A and B. Category A has more keywords and reflects the fact that the area is still maturing, and there are many variations of the same term.By having identified many keywords, we have adopted a high sensitivity strategy, as defined by Dieste & Padua (2007), and understand that many papers can be found, with only few of them being relevant to answer our research question (low precision).
The search included digital libraries available and papers published in journals, conference and workshop proceedings.We searched eight digital libraries, as following: -IEEEXplore (http://ieeexplore.ieee.We searched for industrial experience reports, theoretical and empirical papers, and experimental papers.To include a paper in the analysis, the paper must have been available online, must have been written in English, and must have described (1) evolution patterns for DSD or (2) a need for the development of such patterns.The papers were classified following a two-step approach.First, based on the reading of papers title, and abstract, the papers were classified in two categories: -[Incl], indicating the papers collected and possibly related to DSD evolution -[Excl], indicating the papers collected but not related to DSD evolution All the papers in category [Excl] were excluded, while the papers in category [Incl] were analyzed more carefully based on the reading of introduction, conclusion, and specific parts related to the main contribution.Then a subset of papers in [Incl] was selected, keeping only those discussing DSD evolution.A pilot of this protocol showed that in some cases the reading of title and abstract was not enough to classify each paper properly.
One researcher applied the search strategy to identify the primary papers, and filtered the identified papers, by reading the abstract, to produce an initial categorization.This was followed by a reading of the full text, and a second classification step was executed, checking whether the inclusion/exclusion criteria were satisfied.In case of any conflict, a second researcher made the verification.After this process, both researchers reached an agreement about what papers to be selected.The papers were classified according to three general categories of information: -General information: digital library, title, authors, source (e.g.journal or conference proceedings) type of source (i.e.journal, conference, workshop, technical report), and category ([Incl], or [Excl]).
-Research-related information: type of paper (i.e.theoretical, industrial experience report, empirical study, or experimental study), research empirical strategy (i.e.case study, survey, experiment, ethnography, action research, combination), data collection methods (i.e.interview, observation, questionnaire, document inspection), type of data analysis (i.e.qualitative, quantitative), and data analysis method (i.e.statistics, grounded theory, content analysis).For papers reporting empirical work, the type of study was classified according to the proposal in Neto et al (2007).Research strategy, data collection, type and method of data analysis were classified according to the terminology used by Oates ( 2006).
-Content-related information: business model (i.e.offshore outsourcing, internal offshoring, offshoring, other), scope of the study (i.e.business, technical), outcome (i.e.model proposal, need for a model), evolution type (i.e.maturity, capability, stage, other), focus of the study (i.e.people, project, organization), which site (i.e.client/headquarters, vendor/subsidiary), attributes, and general comments.Attributes are related to specific themes explored in each study, and general comments are a brief summary of each selected paper to guide the qualitative analysis.
After the information was extracted, the papers were classified in one of the categories illustrated in figure 1.

Figure 1. Paper categorization
Both quantitative and qualitative analyses were employed for each of the categories outlined in Figure 1.While our quantitative analysis includes descriptive statistics, the qualitative analysis brings information about characteristics of each study, strengths, and weaknesses.In this paper we present the quantitative results of our analysis.

RESULTS
The systematic review was conducted from October to December, 2007.A total of 227 papers have been found, as presented in table 2. After the initial filtering, 63 papers were selected for the second filtering, where 26 were selected for an in-depth analysis (Table 3).As one can observe, the lack of standard terminology in DSD resulted in a large number of papers to start with, but only a few were selected, confirming the high sensitivity and low precision of our search, as described in subsection 4.3 and suggested by Dieste & Pádua (2007).One paper found in IEEEXplore was previously published in a conference found in the AIS eLibrary, and was classified as repeated.Two other papers proposed maturity models related to pure outsourcing.Since this not necessarily characterizes DSD, they were not selected for further analysis.Moreover, four more papers were included in the list of papers to be analyzed.One is a journal paper cited as reference in some of the papers selected (Carmel & Agarwal, 2002).The other three are related to the research question and were selected based on our knowledge of the area (Meyer, 2006;Mirani, 2007;Höfner & Mani, 2007).In total, 30 papers were selected for analysis and can be found in the the Appendix.Next we present the results based on the quantitative analysis of the data.

Analysis
The findings of the quantitative analysis were divided in general findings, analysis by year, research-related information, and content-related information.
General findings: most of the papers report empirical studies (19, or 63%).We also found seven industrial experience reports (23%) and four not classified.In general, most of the papers found were related to offshore outsourcing or offshoring.The papers classified as others were related to DSD, but not necessarily global.We also found more papers from a business perspective -14 (47%) focused on business decisions, 10 (33%) focused on technical decisions, and 6 focused on both (20%).Out of the 30 papers found, 11 describe models of DSD evolution, while 19 argue about the need for such models.
An important observation is that 70% of the papers address aspects of offshore outsourcing and offshoring business models.This indicates that internal offshoring, despite important, is still not being studied as it could be.
Another interesting pattern is that some of the research currently being done in DSD doesn't explicitly explain the distribution.In our review, 30% of the papers claim to study globally distributed development (offshoring), but there is no evidence related to any of the business model.As stated by Herbsleb (2007), the processes employed in offshore outsourcing might be different than those employed in internal offshoring, and the characterization in this case could make a difference for the practice of DSD.Moreover, research conducted in one type of distribution is not necessarily valid for all types of DSD.Another observation is that the amount of papers classified as offshoring and others indicate that almost half of the papers don't define the relationship between the companies.In this case, the findings present a good indication that a better contextualization is needed for all papers, in order to understand the practices that apply to each type of DSD (considering the relationship between organizations and geographic location).
Analysis by year: given the growing interest in distributed software development in the recent decade, it was particularly interesting to identify the number of papers on DSD in each year, since the first paper was published in 1998.The trend in our review is shown in table 4.   One can observe that no model was proposed in the literature regarding evolution within the internal offshoring of software development from a technical point of view.
Research-related information: all the papers were also classified based on the research methods employed, as well as data collection and analysis.We didn't find any information explaining the research methodology in the four papers where the type was not identified.When analyzing the experience reports, only one paper employed and explained some research methodology (Arora et al, 2001).The authors planned a case study with observations and interviews in some Indian vendors, having qualitative data to be analyzed using content analysis.
From the analysis of the 19 empirical papers, we found 15 case studies (79%), one survey, one focus group, and two papers with a multi-method approach (one used focus group with survey, while the other case study with action research).The data analysis methods were mostly qualitative (14 papers), while three papers employed quantitative methods, and one used both qualitative and quantitative methods.Regarding data collection methods, 10 papers used only interviews (53%), 2 used only questionnaires, 1 used only documentation, 2 used both interviews and questionnaire, 1 used both interviews and observations, 2 used a triangulation among documentation, observation, and interviews, while for one of the papers it was impossible to identify the data collection method.
Content-related information: we present the main results based on the content of each paper in Table 6.First, the papers were classified regarding the type of evolution (maturity or capability model, or a stage model not explicitly defined as maturity or capability).In Table 6 "C" stands for capability, "M" for maturity, and "S" for stage.Based on the table, one can observe that 2 papers explored the concept of stage models (7%), 7 papers (23%) explored the concept of maturity models, and 21 papers (70%) explored the concept of capability models.On analyzing offshore outsourcing and offshoring, one can observe a common pattern related to the need for models and models proposed for both capability and maturity categories.In the internal offshoring category, all papers suggesting the need for a model are related to capability and maturity models, and the same is true for papers describing models.Finally, in the category of other papers, we found four papers arguing in favor of the identification of patterns of evolution, proposing the development of either capability or stage models.However, we didn't find any proposals in this category.
As mentioned above, 11 out of the 30 papers found describe models of DSD evolution.Table 7 shows information about the level of analysis in these models (i.e.people level, project level, or organization) and the site (client or vendor for offshore outsourcing, headquarters or subsidiary for internal offshoring, or any of these combinations for offshoring).
One paper considered a model developed for both client/headquarters and vendor/subsidiary sides, and for this reason the table shows 12 and not 11 models.Most of the models have the organization as focus (10, or 83%), and most of them also have vendor/subsidiary as focus (7, or 58%).Table 8 presents the same information based on the other 19 papers exploring the need for models of DSD describing evolution patterns.Among the four papers describing the need for models in the "others" category, three are related to projects, and one to people, three of them don't have the site clearly defined, and one explore both sides.Based on table 7, one can observe more than half of the models are related to the vendor/subsidiary side.Moreover, we found two times more papers claiming for the need of models in the vendor/subsidiary side (Table 8).Regarding the focus, an interesting observation is that although the majority of papers describing models focus on the organization level (10), the papers describing the need for such models are balanced between both organizations and projects (in one study the focus is on people).

Discussion
A number of conclusions can be drawn from the quantitative analysis in this systematic review, as follows: Conclusion 1: There is a need for more studies related to the internal offshoring model The internal offshoring business model, also known as offshore insourcing, captive subsidiaries, or wholly-owned subsidiaries, appears as the least studied model.This is surprising, given the large number of companies involved with this strategy.According to Ramamani (2006), from over 900 companies associated to NASSCOM (National Association of Software Companies), an Indian organization that represents all the companies in the Information Technology industry, more than 300 are wholly owned subsidiaries.Consequently, the challenges and practices should be also understood for this type of DSD.And this is another opportunity for DSD researchers.

Conclusion 2:
There is a need for more studies that focus the analysis at the level of projects, not only organizations Since most of the models have the focus oriented to a business perspective, not surprisingly most of the papers focus the analysis at the level of organization, not people or projects.But there is also a need to further our understanding of DSD evolution within a project, or a set of projects, and not only from a strategic perspective.Similar to the first conclusion, this is also an opportunity, and this could include analysis of existing maturity or capability models that have projects as part of the scope (CMMI for example), interpreting how they can adapted to a DSD environment, where we can have several stakeholders participating (e.g.including more than one subsidiary, several teams, and many locations).Some research already included in this review has been conducted in this direction (Ramasubbu et al, 2005), and some papers already shared these ideas (Sengupta et al, 2006;Meyer, 2006).

Conclusion 3: There is a need for more studies that address the technical aspects of DSD evolution
Most of the models proposed are related to a business perspective.This creates the opportunity for SE researchers to explore and understand DSD evolution from a technical perspective as well.There are already several papers published in the SE literature towards this direction (i.e.Sengupta et al, 2006;Ramasubbu & Balan, 2007).

Conclusion 4: There is an opportunity for studies to employ quantitative data analysis methods
Most of the capability, maturity or stage models that have been proposed so far are largely based on analysis of qualitative data.This is the case of CMMI (Chrissis et al, 2006), or the eSCM-SP (Hyder et al, 2006), for example.This was also found in our systematic review, where a significant number of papers (half of them) conducted the research based on qualitative methods.One of the reasons behind this rely on the fact that most of the times the phenomenon is not known beforehand, and for this reason an exploratory strategy is followed, through case studies for example, using interviews or other qualitative data collection methods.Quantitative data analysis, however, offers the opportunity to statistically evaluate the findings identified through qualitative methods.

Conclusion 5: There is a need for more studies to study DSD at the vendor side
In a literature review of information systems outsourcing, Gonzalez et al (2006) found 131 papers published in IS journals, where they identified that only 16% of the papers have explored outsourcing from the perspective of the service provider (or vendor).In our systematic review, most of the models proposed (Table 7) were focusing on the phenomenon at the vendor site (58%).We also found two times more papers arguing for the need of such models in the vendor side (Table 8).In total, 67% of the papers we found were exploring the vendor side (33% of them exploring both sides).This is clearly a difference between the two reviews, and they also had different purposes.While Gonzalez et al (2006) searched for any type of papers exploring IS outsourcing, looking into IS journals only, we focused on globally distributed development, and searched both SE and IE domains, including conferences, and workshops as well.For this reason, the reader should understand that our review was based on distributed software development, and the many outsourcing arrangements out there in the end can create a distributed environment (local or global).On the other hand, Gonzales et al ( 2006) have analyzed outsourcing from a IS perspective, focusing business drivers and decisions.An interesting conclusion is that the study of outsourcing, in the IS domain, not only is more concentrated in strategic decisions, and outsourcing relationship, but also is more client oriented.Based on our results, there is a need to better understand the vendor's side as well, from a technical perspective.

Conclusion 6: Distributed software development should be better contextualized
Almost half of the studies found in this systematic review don't explain the type of DSD environment, as presented in table 6.But with the development of this area, it is becoming necessary to better contextualize the type of distribution under study (Herbsleb, 2007).A successful practice executed in a locally distributed environment might not work well in a globally distributed setting.As DSD becomes more mature, it is also necessary to differentiate the many types of distribution, and the implications.

Limitations of this Systematic Review
Systematic review is a useful method that, based on a research question and a detailed planning, searches for primary papers within a specific domain.But as any other method, there are some limitations.We discuss three main limitations: first is related to the number and the sources (libraries) selected, the second refers to the reliability of the paper classification method, and about the third is related to the quality of the search engines.
First of all, we didn't look into every possible source.Eight digital libraries were selected based on experiences shared by other groups (Neto et al, 2007;Brereton et al, 2007;Dyba et al, 2007;Dyba et al, 2005) and on the subject under review.First, by selecting the libraries in our review, we have increased our range of search within the SE domain.Since DSD literature is documented in both SE and IS domains, we have added two libraries from the IS domain (AIS eLibrary and the proceedings of ECIS) to cover another significant amount of primary papers, and important IS conferences such as ECIS, ICIS, and AMCIS, as mentioned in Gonzalez et al (2006).Other IS papers were covered by looking into the HICSS proceedings (using IEEEXplore DL), Compendex, INSPEC, and Elsevier ScienceDirect.However, we didn't search for books, neither into other sources of IS papers that could focus on studies from a business perspective.However, we believe that the results presented give a good indication of the state of the art and the state of the practice of DSD evolution in global settings.
Second, the classification process based on some criteria could be subjective.To minimize this limitation, a twostep approach was planned for paper selection, as explained in section 4, and another two-step approach was planned for paper categorization.All papers were reviewed at least three times by the same researcher.To define the criteria as well as the concepts for paper categorization/classification, we were involved with in many other interactions with at least two other researchers outside the systematic review.The second step was the review of the categorization with at least another researcher.
Third, with regard to the quality of search engines, we could not use the same search string in all digital libraries.We found two of the search engines too simplistic (AIS eLibrary and the website with ECIS papers), with no support to logic operators, or no clear instructions on how to execute the search.For this reason, we had to search for each keyword individually.Another search engine that we used (ACM DL) did not support complex search strings and thus we combined a subset of keywords and split the search into several searches.The result was positive, although less effort could have been spent should we have had better support from some of the search engines chosen.

Table 5
presents the analysis by year comparing the three types of business models.