Distributing Internal Packages
By the end of this lesson, you will be able to:
- Explain why organizations use Package Manager to distribute internally developed packages
- Describe how Package Manager builds packages from a Git repository
- Identify the role of sources, git-builders, repositories, and subscriptions in a Git-backed workflow
- Recognize the prerequisites for building R and Python packages from Git
Introduction
Many teams develop internal packages to share data analysis methods, data connections, and common processes across their organization. Distributing these packages by hand, by emailing tarballs or pointing colleagues at a Git URL, quickly becomes error-prone: people end up on different versions, builds are inconsistent, and there is no single source of truth.
Package Manager solves this by serving internal packages from the same infrastructure that already serves CRAN and PyPI. It can build packages directly from a Git repository, so that every time new code is pushed, the package is rebuilt and made available to users automatically. This gives your data science teams a governed, reproducible way to consume internal code, and gives you a single place to manage it.
This lesson explains the concepts behind Git-backed package distribution. In the hands-on lab that follows, you will configure Package Manager to build a package from Git and serve it to users in your organization.
- Reading time: 10 minutes
- Documentation reading time: 15 minutes
- Hands-on exercise time: 10 minutes
Building Requirements: R and Python
To build a package from Git, Package Manager needs a working installation of the language the package is written in. To build R packages, R must be installed; to build Python packages, Python must be installed.
By default, Package Manager scans the file system to detect available versions of R and Python. You can also explicitly configure the path to the version you want it to use. Specifying the version explicitly is recommended, because it removes any ambiguity about which interpreter performs the build and ensures Package Manager uses the version you expect.
This is why the installation lab had you install R: that same R installation is what Package Manager will use to build internal R packages from Git.
How Package Manager Builds Packages from Git
Once a language is available, Package Manager can clone a Git repository and run a job that transforms the repository contents into an installable R or Python package. Users then install that package from a repository exactly as they would any other package, using install.packages for R or pip for Python.
Package Manager continuously polls the Git endpoint to watch for changes. Depending on how you configure the build, it watches either for new commits on a branch or for new tags. When an update is detected, Package Manager automatically pulls the latest changes and builds the new packages in the background, so distribution stays current without manual intervention.
Versioning behaves differently by language. For R packages, previous versions are archived as new ones are built. For Python packages, all versions remain available. Choosing to build from tags rather than commits gives you control over exactly which states of the repository become published package versions, which is usually what you want for stable releases.
Sources, Git-builders, Repositories, and Subscriptions
A Git-backed workflow in Package Manager is assembled from four building blocks. Understanding how they fit together makes the configuration steps in the lab easier to follow.
- A source is the upstream of packages. For internal packages, you create a Git source that links one or more Git repositories to Package Manager. A single source can hold multiple Git repositories, and you can create multiple sources to organize repositories by team, project, or any scheme that suits your organization.
- A git-builder attaches a specific Git repository to a source and defines what triggers a build, either new commits on a branch or new tags.
- A repository is what users actually subscribe to and install from. It is the consumer-facing endpoint.
- A subscription connects a source to a repository, making the packages built from that source available in that repository.
In short: a git-builder feeds packages into a source, and a subscription exposes that source through a repository that users can install from.
Internal package distribution is one of the most common requests administrators receive once Posit products are in place. Data science teams want their shared code available everywhere their work runs, including Workbench sessions and Connect deployments.
- Single source of truth: Everyone installs the same internal packages from the same place, eliminating version drift between team members.
- Automation: Git-backed builds mean you configure the pipeline once, and new releases are published automatically as code is pushed or tagged. There is no manual rebuild step to forget.
- Governance and reproducibility: Internal packages sit alongside CRAN and PyPI in the same managed infrastructure, benefiting from the same versioning and snapshotting story you rely on for external packages.