Finding and choosing files (
pip._internal.index sub-package in pip is responsible for deciding
what file to download and from where, given a requirement for a project. The
package’s functionality is largely exposed through and coordinated by the
Here is a rough description of the process that pip uses to choose what file to download for a package, given a requirement:
Collect together the various network and file system locations containing project package files. These locations are derived, for example, from pip’s --index-url (with default https://pypi.org/simple/ ) setting and any configured --extra-index-url locations. Each of the project page URL’s is an HTML page of anchor links, as defined in PEP 503, the “Simple Repository API.”
For each project page URL, fetch the HTML and parse out the anchor links, creating a
Linkobject from each one. The LinkCollector class is responsible for both the previous step and fetching the HTML over the network.
Determine which of the links are minimally relevant, using the LinkEvaluator class. Create an
InstallationCandidateobject (aka candidate for install) for each of these relevant links.
Further filter the collection of
InstallationCandidateobjects (using the CandidateEvaluator class) to a collection of “applicable” candidates.
If there are applicable candidates, choose the best candidate by sorting them (again using the CandidateEvaluator class).
The remainder of this section is organized by documenting some of the
classes inside the
index package, in the following order:
PackageFinder class is the primary way through which code in pip
index package. It is an umbrella class that encapsulates and
groups together various package-finding functionality.
PackageFinder class is responsible for searching the network and file
system for what versions of a package pip can install, and also for deciding
which version is most preferred, given the user’s preferences, target Python
The pip commands that use the
PackageFinder class are:
The pip commands requiring use of the
PackageFinder class generally
PackageFinder only once for the whole pip invocation. In
fact, pip creates this
PackageFinder instance when command options
are first parsed.
With the exception of pip list, each of the above commands is
implemented as a
Command class inheriting from
(for example pip download is implemented by
PackageFinder instance is created by calling the
list, on the other hand, constructs its
PackageFinder instance by
difference may simply be historical and may not actually be necessary.)
Each of these commands also uses the
PackageFinder class for pip’s
“self-check,” (i.e. to check whether a pip upgrade is available). In this
PackageFinder instance is created by the
PackageFinder class is responsible for doing all of the things listed
in the Overview section like fetching and parsing
PEP 503 simple repository HTML pages, evaluating which links in the simple
repository pages are relevant for each requirement, and further filtering and
sorting by preference the candidates for install coming from the relevant
PackageFinder’s main top-level methods is
find_best_candidate(). This method does the following two things:
find_all_candidates()method, which gathers all possible package links by reading and parsing the index URL’s and locations provided by the user (the LinkCollector class’s
collect_sources()method), constructs a LinkEvaluator object to filter out some of those links, and then returns a list of
InstallationCandidates(aka candidates for install). This corresponds to steps 1-3 of the Overview above.
CandidateEvaluatorobject and uses that to determine the best candidate. It does this by calling the
compute_best_candidate()method on the return value of
find_all_candidates(). This corresponds to steps 4-5 of the Overview.
PackageFinder also has a
process_project_url() method (called by
find_best_candidate()) to process a PEP 503 “simple repository”
project page. This method fetches and parses the HTML from a PEP 503 project
page URL, extracts the anchor elements and creates
Link objects from
them, and then evaluates those links.
CandidateEvaluator class contains the business logic for evaluating
InstallationCandidate objects should be preferred. This can be
viewed as a determination that is finer-grained than that performed by the
In particular, the
CandidateEvaluator class uses the whole set of
InstallationCandidate objects when making its determinations, as opposed
to evaluating each candidate in isolation, as
LinkEvaluator does. For
example, whether a pre-release is eligible for selection or whether a file
whose hash doesn’t match is eligible depends on properties of the collection
as a whole.
CandidateEvaluator class uses information like the list of PEP 425
tags compatible with the target Python interpreter, hashes provided by the
user, and other user preferences, etc.
Specifically, the class has a
This accepts the
InstallationCandidate objects resulting from the links
accepted by the
evaluate_link() method, filters
them to a list of “applicable” candidates and orders them by preference.
CandidateEvaluator class also has a
that returns the best (i.e. most preferred) candidate.
Finally, the class has a
compute_best_candidate() method that calls
get_applicable_candidates() followed by
then returning a BestCandidateResult
object encapsulating both the intermediate and final results of the decision.
CandidateEvaluator are created by the
make_candidate_evaluator() method on a per-requirement basis.
CandidatePreferences class is a simple container class that groups
together some of the user preferences that
PackageFinder uses to
CandidateEvaluator objects (via the
PackageFinder instance has a
_candidate_prefs attribute whose value
CandidatePreferences instance. Since
PackageFinder has a number
of responsibilities and options that control its behavior, grouping the
preferences specific to
CandidateEvaluator helps maintainers know which
attributes are needed only for
BestCandidateResult class is a convenience “container” class that
encapsulates the result of finding the best candidate for a requirement.
(By “container” we mean an object that simply contains data and has no
business logic or state-changing methods of its own.) It stores not just the
final result but also intermediate values used to determine the result.
The class is the return type of both the
compute_best_candidate() method and the