In December of 2020 at Coalesce, nearly 800 people tuned in live to hear David Murray, director of data and analytics at Snaptravel, share his team’s experience with data team org structure. Over the last four years, the data team at Snaptravel has grown from one analyst to almost a dozen, and they have tried five different data team structures over the course of nine months. This is a legitimately difficult problem!
Org structure is challenging for everyone in the industry, not just for fast-growing teams. I lead the customer success team at Fishtown Analytics. We’ve worked with data teams of all shapes and sizes—from data teams of one to massive enterprise teams comprising dozens of engineers and hundreds of analysts. We get asked questions about the ideal data team structure all the time, so I, like the rest of the dbt community, was interested in this talk.
This article recaps some of David’s key insights from the talk along with my own commentary and some thoughts from the dbt community. I highly recommend watching his talk or reading his blog post on the topic.
Why is data team structure so difficult? #
This topic clearly resonated with a lot of folks, and I think it’s worth considering why that is. This problem is not unique to SnapTravel in any way. It’s something we’re all thinking about right now. My take? Data team structure is difficult because data technology has changed so rapidly over the past five years and this has had a cascading effect on what data people do.
Ten years ago, the most challenging problem a data team faced was managing compute and store resources. James Densmore, currently the director of data infrastructure at HubSpot, described the old challenges of data management in a recent blog post,
“…getting a columnar database in the early 2010s meant investing in some serious bare-metal. For the young folks out there, that means you had to buy a physical server rack filled with what we now refer to as compute and storage. (as an aside, if you’re never had the pleasure of managing physical hardware in a freezing cold server room, find a way to visit a data center. It makes what we do feel more “real”)”
At this time, data analysts had no choice but to request changes to the data warehouse and patiently wait for the data engineers to deliver. Modern cloud warehousing completely upended this relationship. James goes on to write:
“…this new breed of data warehouses made it possible (and economical) to store and query far higher volumes of data than ever before. From there, data engineers found they could focus their efforts on efficient ingestion (Extract-Load) of data into warehouses where data analysts could flex their SQL muscles to model the data (Transform) on their own. ELT not only saved the data warehouse, but it lead to the restructuring of data teams and the emergence of a new role, the analytics engineer.”
The challenging problems of managing compute and store resources have largely been solved. The biggest challenges today are around speed: How can we help data engineers and analysts collaborate more effectively? How can we empower analysts to move quickly without sacrificing data quality? How can we empower analysts, engineers, and business users to make sense of the data in our warehouse? These aren’t questions about technology, they’re questions about humans and how we can all work better together.
The big question in data team structure: centralized vs. decentralized #
What I’ve seen working with companies at varying sizes, and what I’ve learned from folks in the dbt Community, is that the spectrum of centralized to decentralized is one of the key decisions to make about data team org structure. David used the word “embedded” to describe the more decentralized model, others refer to this model as “distributed”, we’re all talking about pretty much the same thing. Here is how David defined the two ends of this spectrum:
Picture courtesy of Snaptravel presentation
A quick poll in the dbt Slack channel for this talk showed that a majority of teams use a centralized model as opposed to an embedded model.
The centralized model #
In the fully centralized data team model, all data resources – people (data analysts, analytics engineers, data engineers, data scientists, etc.) and technology (data warehouse, transform, ingest, BI tools) – are owned by one central data team. If someone from product or finance has a data-related request, they submit it to the data team for prioritization.
Picture courtesy of Medium
A few benefits of this model…
- Alignment of data resources to company-need: When you are a small data team, like Snaptravel was, and are growing, company-alignment is particularly important. A small company doesn’t have the bandwidth to do all the things. It’s important to focus data resources on the highest-impact areas of the business.
- Knowledge-sharing: By placing analysts and engineers in close alignment, the centralized model prioritizes knowledge-sharing. This makes it easier to build cultural data norms together like naming conventions, syntax, or even how to write and review pull requests.
- Mentorship: In the centralized model, analysts get to learn from more senior analysts as well as data engineers. This is incredibly valuable for analysts new to the analytics engineering workflow.
The biggest issue with a centralized model is speed. If marketing needs support adjusting their attribution model, it’s likely going to have to wait until the end of month reporting is wrapped for the finance team.
The decentralized model #
In the decentralized model, you’ll typically see a central core group of data engineers who own the data warehouse with analysts being decentralized, or embedded, within a business function such as finance or product.
Picture courtesy of Medium
The biggest advantage to the embedded model is speed. Data resources are aligned with department needs (instead of company needs). So if a business user has a request, they don’t need to wait for that request to be prioritized against all of the other needs of the business. Faster time to insights!
Speed also comes from having greater context. In a centralized model, work tends to be assigned in a more “round robin” fashion. In a decentralized model, the marketing analyst owns all marketing requests. They understand that function’s KPIs, know the metric definitions, and are familiar with the quirks of the data. This is often a benefit to both business users (who spend less time explaining themselves) and analysts (who get to go deep in a given function).
However, what we see is that this speed is highly dependent on just how empowered analysts are. If analysts are empowered to own the analytics engineering workflow, this model can work quite well (see examples from JetBlue and HubSpot). If analysts in a decentralized model spend most of their time in the BI tool and rely on data engineers for data transformation and modeling work, then analytics velocity will slow as analysts wait in the data engineering queue.
One of the biggest downsides that we see with the decentralized model is how challenging it can be to keep analysts working closely together and improving their shared knowledge of data analytics. Let’s say your head of finance hires a finance analyst. It’s very possible (likely!) that person will continue to work in the spreadsheets that finance teams are traditionally accustomed to rather than adopt the modern data stack used by your centralized team. This is what David calls “knowledge share”. In a centralized model, the knowledge that is most easily shared is data-related, in a decentralized model it’s domain-related.
Snaptravel’s five data team structures #
Five data team structures in nine months is a lot, but the potential efficiency gains for their team felt important enough to make these efforts worthwhile. “We actually tried out about five different structures, which, if anyone on my team is here, we apologize,” David said to conference attendees. “That’s a lot of work structures. We would not recommend that. There’s a lot of change, and there’s a lot of reasons that organizations should not do that.”
Each of the five structures that Snaptravel tried was a different mix of centralized vs. decentralized. Ultimately they landed where we see more and more companies land – a hybrid version. The question for data teams is no longer “centralized vs. decentralized?” The question is “what, exactly, should be centralized and what should be decentralized?”
Here are the five structures Snaptravel’s data team used:
- Growth Team: When Snaptravel received Series A funding, they launched their growth team and began to embed their data analysts to better serve other departments.
- Agile: While vacationing in London, England, Nehil discovered dbt, which allowed Snaptravel to keep track of all their data models. To allow the analysts to work together, they quickly centralized analysts onto one team, switching to an agile approach.
- Full-Stack: Snaptravel’s agile approach led to a ton of problems within the organization. Data engineers and analysts were not company-level aligned with their priorities and that needed to change. Snaptravel quickly changed this approach and merged four data engineers with four analysts to form a full-stack team. They were finally able to prioritize tasks at a company-level while improving knowledge-sharing between both roles.
- Pod: Their full-stack team quickly grew from eight team members to 12 in March 2020, and team meetings became a waste of time for most members, because only one or two people were needed to make a decision. Their solution to this problem was to create multiple pods that specifically owned a full-stack problem in a given area of the business.
- Domain Structure: While their pod solution solved an initial problem, it eventually led to a bigger problem that slowed down their team’s progress. The full-stack pod structure lacked ownership over objectives and, at times, there were four to six people all trying to come up with a decision. The last, final change they made to their structure is referred to as a Domain structure.
The perfect structure (for now): Domain-based data teams #
Finally, after nine months of constant change, Snaptravel landed on a hybrid setup that they call “domain-based” team structure. In this structure, a senior member of the team is labeled “domain lead” for a specific business area in a domain-based structure. They are then responsible for assigning work to other data engineers and analysts on an individual basis to support business priorities
Picture courtesy of Medium
This filled some critical gaps for them:
- Ownership: “One of the reasons domain leaders really really really like this structure is because they have ownership over all the outcomes of a given area of the business,” David said. Data team members aren’t just order takers, they get to see the way their work impacts the results of a given team.
- Domain Expertise: David pointed out that this ownership creates something valuable for business users as well – domain expertise. When business users have a data need, they’re always working with the same people and have confidence that this person already knows how their core data sets work and understands the unique nuances of their function.
- Collaboration: Data analysts and engineers are able to work on tasks that fit their skill set while sharing best practices with one another. With every analyst and engineer having their own responsibility, they are held accountable to complete tasks in a reasonable amount of time.
While this process currently works for Snaptravel, they recognize that it will evolve as their data team and organization grow.“One of the things that I’ve heard from people is that it won’t scale,” David said. “And to be honest, I don’t know. I’ve never worked at a large data organization. What we know is that this works for 10 people, and we think it could probably work for 20 people. Beyond that, we don’t know.”
Closing thoughts #
I found this talk fascinating. Over the four plus years I’ve been on the Fishtown Analytics team, I’ve witnessed growing data teams struggle with this first hand. It cannot be overstated how important it is to reassess team structure with some regularity. What works for a data team of 2 will not work for a team of 20. And what worked in 2015, will not work in 2025.
It’s rare to get such a thorough and deep dive into how teams think about the problem of organizational design and the problems they encountered along the way. During the talk, folks in the Slack channel shared a few other fantastic resources on this topic. I’m adding them here so we can keep learning together:
- Models for integrating data science teams within organizations by Pardis Noorzad ( h/t Jussi Kämäräinen)
- How should our company structure our data team? by David Murray
- Big Data, Bigger Impact by Ken Rudin, former Director of Analytics, Facebook (h/t Tim Jenkins)
If you missed David’s talk at Coalesce, you can still watch it here, and I highly recommend doing so.
Coalesce 2021 is taking place from December 6-10! Register for Coalesce here. We hope you can join us.
Last modified on: Apr 26, 2022
FAQs
What is a centralized team? ›
Centralized teams are a core group with a united front that serves multiple business functions, geographies, or products. Decentralized teams take a divide and conquer strategy, with different countries or products building separate teams and unique campaigns.
How are data teams structured? ›While team structure depends on an organization's size and how it leverages data, most data teams consist of three primary roles: data scientists, data engineers, and data analysts. Other advanced positions, such as management, may also be involved.
What decentralized analytics? ›Decentralized Analytics: With each department having analytic functions working towards their main objectives, there is less of a disconnect between data science and the business department. Each one guides and extracts value from the other to build upon previous capabilities and make visions more achievable.
What is the difference between a decentralized system and a centralized system ERP? ›In a centralized network, all users are connected to a central server that stores complete network data and user information. On the contrary, a decentralized network has several peer-to-peer user groups wherein each group has its separate server that stores data and information relevant to only that particular group.
What is centralized vs decentralized? ›In centralized organizations, strategic planning, goal setting, budgeting, and talent deployment are typically conducted by a single, senior leader or leadership team. In contrast, in decentralized organizations, formal decision-making power is distributed across multiple individuals or teams.
Why is decentralized better than centralized? ›The biggest reason why decentralization is better than centralization is the flexibility and data to adapt to market demands quickly.
What are the four types of team structure? ›Teams can be divided into four main groups: project teams, self-managed teams, virtual teams, and operational teams.
What are the 3 different roles in a modern data team? ›In this article, you have learned about three major roles that can be present on a data team: the data engineer, data analyst, and data scientist.
What are the 4 levels in data hierarchy? ›Computer Data Hierarchy: Bits, Characters, fields, records, files, database bigdata.
Is Amazon centralized or decentralized? ›I explain that Amazon is decentralized to an extreme, so almost anything I describe might be different depending on where you work. This decentralization is a core aspect of how Amazon functions, what the employee experience is like, and how product development works.
What is data decentralization? ›
In blockchain, decentralization refers to the transfer of control and decision-making from a centralized entity (individual, organization, or group thereof) to a distributed network.
What are the benefits of having data decentralized? ›In decentralized networks, no one node going down can take down the entire network, so no matter how many users come and go, your applications should remain up and running. There is less censorship.
What are examples of centralization and decentralization? ›- Centralized organizations have all decisions coming from the same place.
- Decentralized organizations have decisions coming from all levels of management towards the same goal.
- McDonald's uses centralization to get a standardized menu everywhere.
...
Comparison Chart.
Basis for Comparison | Centralization | Decentralization |
---|---|---|
Involves | Systematic and consistent reservation of authority. | Systematic dispersal of authority. |
Decentralized: Facebook is currently centralized, relying on one individual to make decisions and provide the direction for the company.
Is Apple centralized or decentralized? ›Centralized Organizations
Apple is an example of a business with a centralized management structure. Within Apple, much of the decision-making responsibility lies with the Chief Executive Officer (CEO) Tim Cook, who assumed the leadership role within Apple following the death of Steve Jobs.
Good examples of decentralised business are Hotels, supermarket, Dress showrooms and etc. Because it is not possible for one person to focus on more than 100 branches which have branches throughout the world, take an example of a hotel.
What is the major difference between centralized decentralized and distributed? ›In a centralized system, control is exerted by just one entity (a person or an enterprise, for example). In a decentralized system, there is no single controlling entity. Instead, control is shared among several independent entities. Distribution refers to differences of location.
What are the pros and cons of decentralization? ›- It can help the organization grow overall. ...
- It encourages accountability and transparency. ...
- It develops more leaders. ...
- It breeds innovation and flexibility. ...
- It isn't ideal for new organizations. ...
- It can breed unhealthy competition. ...
- It duplicates work.
The advantages of decentralized decision-making are that it promotes innovation, creativity, and entrepreneurship. It can help improve employee engagement and increase productivity. It can also decrease issues with power dynamics, as decisions are made by the people closest to the situation.
Which is more efficient centralized or decentralized system? ›
Benefits of a centralized system
In a centralized system, each member of the system has a distinct role, creating a top-down structure that is often far more efficient than a decentralized alternative.
A team is any group of people organized to work together, both interdependently and cooperatively to accomplish a purpose or a goal. Three common types of workplace teams include functional or departmental, cross-functional, and self-managing.
What are the 7 team roles? ›The 7 roles of @Buurtzorg's self-managed teams: (1) the main role, (2) the housekeeper, (3) the informer, (4) the developer, (5) the planner, (6) the team player, and (7) the mentor.
What are the 5 key roles on a team? ›- LEADER: makes sure team has clear objectives and members are engaged. ...
- CHALLENGER: questions effectiveness and drives for results. ...
- DOER: encourages progress and takes on practical jobs. ...
- THINKER: produces ideas and thinks through those proposed by others. ...
- SUPPORTER: eases tension and promotes harmony.
The types of data proven to be most valuable to companies are customer data, IT data, and internal financial data.
What are the six levels of hierarchy? ›NIOSH defines five rungs of the Hierarchy of Controls: elimination, substitution, engineering controls, administrative controls and personal protective equipment. The hierarchy is arranged beginning with the most effective controls and proceeds to the least effective.
What is the highest level of data organization? ›The highest level in the hierarchy of data organization is called database. Database is a collection of all tables which contains the data in form of fields.
What is the correct hierarchy of data? ›Data hierarchy refers to the systematic organization of data, often in a hierarchical form. Data organization involves characters, fields, records, files and so on. This concept is a starting point when trying to see what makes up data and whether data has a structure.
Is Microsoft centralized or decentralized? ›Centralization. Unlike Apple – where decision-making is made by all levels of management – Microsoft remains predominantly centralized with decisions made by those with authority. Instituted by Bill Gates, centralized decision-making standardizes work output and removes the potential for personal biases.
Is Google centralized system? ›The organization structure at Google is highly decentralized, which makes sense considering the large size of the organization. The decision making process is spread across a number of individuals and outlets, with some of these individuals being located miles away.
Is Starbucks Centralised or Decentralised? ›
STARBUCKS has decentralized authority because they created decision-making for each manager. There are also lots of stores around the world and each store has different from the authority, managers, and customers.
Why do we decentralize data? ›By decentralizing data, it improves speed and accessibility, so data is much more discoverable and consumable for every user in the company. Because teams onboard their own data and manage their own data products, they can visualize it and operationalize it as they see fit, which drives innovation.
What is centralized and decentralized database explain in detail? ›Definition. A centralized database is a type of database that contains a single database located at one location in the network. A distributed database is a type of database that contains two or more database files located at different locations in the network.
What is a decentralized data governance? ›In a decentralized data governance model, a committee typically designs and manages the enterprise data governance strategy. However, functional areas of the business create and manage their own data sets and handle the distribution of information to their users.
What are the disadvantages of Decentralised? ›- Co-Ordination Difficulty: ...
- Waste of Resources: ...
- Larger Interests of the Enterprise Neglected: ...
- Emergency Decision not Possible: ...
- Lack of Qualified Managers: ...
- Certain Activities Decentralization not Possible:
The main disadvantage of a decentralized organization is that you lose control over the day-to-day activities of your company. Maybe “lose” is too strong a word, but you are ceding authority to your managers, which means that you trust their instincts, skills, and talents.
What are the five features of decentralization? ›- Delegation of authority to lower management.
- Faster response time.
- Quick decision making.
- Development of individual departments.
- Employee engagement and development.
McDonald's is a decentralized organization. Rather than a top-down structure, they prefer that most decisions are made as close to the market as possible.
What are the three major forms decentralization? ›These are political, administrative, fiscal, and market decentralization.
Why would a company decide to decentralize? ›Question: Why do organizations decentralize operations? Answer: Organizations often decentralize out of necessity as they expand. The responsibility of one manager, or group of managers, to run the entire organization can become overwhelming as the number of products offered increases.
What is a Decentralised team? ›
What is a decentralized organization? A company with a decentralized organizational structure is one where mid- and lower-level managers make most of the decisions, rather than the senior management team. Sometimes the employees themselves are even involved in the decision-making process.
Is Google Drive centralized or decentralized? ›The traditional centralized storage model is used by all the “big name” providers including Google, Dropbox, Apple, Tresorit, ProtonDrive, Sync, pCloud, MEGA.
Is Instagram centralized or decentralized? ›The most important thing to note about decentralized social media is that it doesn't consist of one central server. Many of the biggest social media networks out there today, including Facebook, Twitter, and Instagram, are all centralized, meaning one core authority holds all the power over the network.
Is YouTube a centralized network? ›YouTube, along with almost every other streaming video website, is a centralized service. For most people, this isn't a problem.
What do u mean by centralized? ›Definition of centralize
intransitive verb. : to form a center : cluster around a center. transitive verb. 1 : to bring to a center : consolidate centralize all the data in one file. 2 : to concentrate by placing power and authority in a center or central organization centralized several functions in a single agency.
Centralized Organizations
Apple is an example of a business with a centralized management structure. Within Apple, much of the decision-making responsibility lies with the Chief Executive Officer (CEO) Tim Cook, who assumed the leadership role within Apple following the death of Steve Jobs.
If a business is centralized, it operates from a central location, rather than spreading things out between stores, offices, or factories. Managers of satellite offices have less power if they work for a centralized company, where decisions are made in a home office.
What is a centralization simple definition? ›Centralization means bringing things to one central place or under unified control. The centralization of a school district might mean the superintendent makes important decisions for all the schools, rather than leaving those choices to individual principals.
Why is centralized better? ›In a centralized organization, decisions are made by a small group of people and then communicated to the lower-level managers. The involvement of only a few people makes the decision-making process more efficient since they can discuss the details of each decision in one meeting.
What are the disadvantages of a centralized database system? ›- Centralized databases are highly dependent on network connectivity. ...
- Bottlenecks can occur as a result of high traffic.
- Limited access by more than one person to the same set of data as there is only one copy of it and it is maintained in a single location.
Is Amazon centralized or decentralized? ›
I explain that Amazon is decentralized to an extreme, so almost anything I describe might be different depending on where you work. This decentralization is a core aspect of how Amazon functions, what the employee experience is like, and how product development works.
What is the difference between centralization and decentralization explain with examples? ›Centralization is the process of concentrating power and authority in the hands of senior management. On the other hand, decentralization refers to the top-down delegation of power and authority to functional-level management.
What are examples of centralization and decentralization? ›- Centralized organizations have all decisions coming from the same place.
- Decentralized organizations have decisions coming from all levels of management towards the same goal.
- McDonald's uses centralization to get a standardized menu everywhere.
Good examples of decentralised business are Hotels, supermarket, Dress showrooms and etc. Because it is not possible for one person to focus on more than 100 branches which have branches throughout the world, take an example of a hotel.
What are the pros and cons of centralized structure? ›- It employs standardization of work. ...
- It ensures unbiased work allocation. ...
- It promotes flexibility. ...
- It does not allow replication of work. ...
- It offers an area of specialization. ...
- It encourages dictatorship. ...
- It brings out the negatives in an administrative system. ...
- It is seen as inflexible.
- It can help the organization grow overall. ...
- It encourages accountability and transparency. ...
- It develops more leaders. ...
- It breeds innovation and flexibility. ...
- It isn't ideal for new organizations. ...
- It can breed unhealthy competition. ...
- It duplicates work.
What is Centralization Strategy? Centralization refers to a setup in which the decision-making powers are concentrated in a few leaders at the top of the organizational structure. Decisions are made at the top and communicated to lower-level managers for implementation.
What are the types of centralization? ›- Departmental centralization: This type of centralization is based on different departments within an organization. ...
- Management centralization: This is the most common type of centralization.