Managing Large-scale Instances in Government Agencies

Feb 19, 2021

Ken Urban
Pre-Sales Solutions Engineer, Federal, Atlassian

As agencies increase the speed of their IT transformation initiatives, they’re finding even more ways to leverage Atlassian applications. Many government agencies are expanding from one or two successful Atlassian-managed projects, scaling their instances to meet a much broader, agency-wide use case. More and more, I’ve been asked how to manage large instances of Atlassian products across an entire government agency. Having managed multiple, extremely large systems (and having lived to tell about it), I thought it might be useful to list some of the best practices and self-protections my team has developed along the way.

Establish a scaling plan for all self-hosted products

When managing self-hosted Atlassian products, no matter how much you feel your agency has overprovisioned things, it’s probably not enough—agency IT needs can grow rapidly overnight. Plan early, add extra, and set up your underlying infrastructure to allow it to scale rapidly. Here are two key tips:

  • Use a virtualized environment: For mission-critical Atlassian solutions, sluggish performance and downtime is not an option. A good virtualization system will move nodes to new hardware if it fails (with zero downtime for that node). Architecting a solution in AWS GovCloud ensures the most stringent security and resilience for agency workloads.
  • Manage infrastructure configuration as code: All configurations, even networks, should use some sort of version control system and configuration management. This ensures that any unintentional changes can be backed out automatically.

Develop agency-wide guidelines

As instances increase in scale, letting individual project owners have instance-wide administrative rights quickly becomes untenable. Establish policies early on to guide how each instance will be managed to maintain compliance. Incorporate agency control, governance, and auditing compliance requirements. Be certain that the policies cover all aspects of each Atlassian instance and make them accessible to users to ensure complete transparency. In addition:

  • Utilize project templates: Create a set of templates with your agency’s most common configurations to greatly speed up project creation.
  • Delegate control to project administrators: Utilize plug-ins to relieve IT staff of the burden of changing project-specific screens. Ensure that changes are limited to the single project scope (or multiple, if owned by the same team).
  • Establish and enforce global items: Global items bring stability to the platform. The best course of action is to keep “out-of-the-box” values and add new global items sparingly. Globals to watch out for include statuses, resolutions, issue types etc.
  • Control the use of custom fields: Establish a methodology for determining when a custom field is necessary: Does the field already have an equivalent in the system? Is the proposed new field overly specific? Does it bring value for one project or the entire agency? Consider scoping fields to limit their impact. Use the custom field optimizer on existing fields to uncover and eliminate fields that are not being used.

TIP: While agencies are finding success consolidating their instances at scale, this may not work for all systems. Each government agency may have the need to federate some workflows, whether to isolate secure environments or maintain controls on personally identifiable information. In these cases, establish a high benchmark for having a special, separate instance. Limit the number of “nonagency” instances as much as possible to focus concentration on the main system.

Let go of old data

In today’s world, we’re all digital pack rats. Government agencies, with their complex records disposition schedules (RDS) may find releasing data even more difficult. Sit down with agency stakeholders and legal teams (if necessary) to review and update the agency’s RDS. Include guidance on what to do with instance records (a ticket, Confluence page, etc.) and how long you should keep it. Once you have that, then you can start detecting and dealing with old data.

Continue to review and optimize instances

Routinely check your agency’s system for unused projects, spaces, repositories, and builds that never change. I once found a defunct Jira project and associated Bitbucket repository that had automated hourly builds enabled. The build ran for several months after the final commit into the repository, which was weeks after the project had formally ended. This was a huge waste of resources both in terms of compute (for doing builds) and storage. Establish a policy for each product around how often something must be in use. Here are some good places to start:


Jira projects and Confluence spaces: Place project/space into read-only mode after 30 days of no new tickets or changes to existing tickets. Send emails notifying project owners about removal from the system after 90 days in read-only mode. If records must still be online per the RDS, make an annotation to revisit when the RDS indicates to do so. Use the data center issue/project archiving 90 days after this for Jira projects.

Consider moving the project/space to archive 90 days after it‘s moved into read-only. Lock it down to admins only. This starts another 90-day timer for actual deletion and helps identify and clean out abandoned projects/spaces that may have been serving as a “read-only repo of knowledge.” Another option is to export the project/space to a safe location 90 days after moving it to the hidden state (i.e., a shared drive for admins). Check to see if the agency RDS allows deletion or long-term storage, and offload as appropriate.

Bitbucket and Bamboo: Bitbucket often contains intellectual property, and it’s harder to determine if Bamboo builds are fresh. I wouldn’t recommend deleting a repo and would follow similar advice for Jira projects and Confluence spaces, but out of an abundance of caution, I’d only export the code and disable the build plans.

Consider each app and customization carefully

Apps are third-party add-ons that an agency has either purchased from the marketplace or written itself. Apps can add to the complexity and cost of an instance, therefore, it’s recommended that agencies limit their usage to apps that provide value to the majority of agency uses. Before any app gets installed, your agency IT staff should do a thorough security evaluation of both it and the company that wrote it. Consider your risk profile and conduct tests appropriately. Ensure that no new ports are opened, security controls remain intact, and data isn’t exposed.

Customizations can be changes to the product itself, such as modifying templates and dialog boxes, or integrations via the REST API. Be cautious when applying any changes to the products at the command line. Every change will need to be carried over into the new version manually, causing the support tail to grow very quickly. REST API integrations are typically safe but enable rate limiting to prevent DOS attacks (either intentional or inadvertent). Only allow integrations to use the public REST API.

Clean up existing “messy” instances

Perhaps your agency is already at scale and would like to optimize its Atlassian instances. Is it better to start over or clean up the current instances?

Unless there’s no value in the current data, starting over is probably the wrong answer. The following is a high-level action plan for cleaning up existing large-scale government instances:

  • First: Establish a single team of system sole administrators. Remove admin rights from everyone else (keeping appropriate delegation, as mentioned earlier).
  • Second: Create guidelines for all new projects/spaces/etc., moving forward.
  • Third: Review, triage, and address problems with the highest impact to the agency.
  • Fourth: Build a plan for tackling each problem. Use Agile methodologies to get it done!
  • Fifth: Repeat as necessary. Your agency may choose to retain some “mess” because it’s a legacy project that will “age off” the system in a few months. That’s OK!
  • Sixth: Celebrate with a team pizza party. You just did a lot of hard work, and everyone involved should celebrate.

Summary

Applying these best practices to enterprise-wide Atlassian deployments can improve management and performance as your agency continues to add more users and applications. Given the scale and mission-critical nature of modern government workflows, the time to establish agency-wide governance and control is early in the modernization process. The end result is a more efficient and effective process, with a lot less risk. Click here to learn more.

Download Our Free Resource to learn more about how agencies can adopt agile practices.