Modules

As your program gets longer, you will need to sort it as a file or split it into several files

Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a module.

  • The file name is the module name with the suffix .py appended.

  • Within a module, the module’s name (as a string) is available as the value of the global variable __name__.

    Whenever the Python interpreter reads a source file, it does two things:

    • it sets a few special variables like __name__, and then
    • it executes all of the code found in the file.

    The import statement do the same, so it will result in code executing!!!, see more at stackoverflow and geeksforgeeks

    1
    2
    3
    4
    5
    6
    7
    8
    # source of ptest.py
    print('ptest is imported')

    def badcode():
    print('badcode is executed')

    print('real bad code should be like this.')
    print(__name__) # __name__ is __main__ when call by interpreter directly, otherwise the name of the module.
    1
    2
    3
    4
    # source of callptest.py
    import ptest

    print(__name__)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    $ python3 ptest.py
    ptest is imported
    real bad code should be like this.
    __main__
    $ python3 callptest.py
    ptest is imported
    real bad code should be like this.
    ptest
    __main__
  • the import statement does not enter the names of the functions defined in ModuleName directly in the current symbol table; it only enters the module name ModuleName there. Using the module name you can access the functions.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    >>> import ptest
    ptest is imported
    real bad code should be like this.
    ptest
    >>> badcode()
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    NameError: name 'badcode' is not defined
    >>> ptest.badcode()
    badcode is executed

    6.1. More on Modules

  • import statement will execute the codes in a module only the first time it is encountered, this prevent multiple importing.

    1
    2
    3
    4
    5
    6
    7
    >>> import callptest
    ptest is imported
    real bad code should be like this.
    ptest
    callptest
    >>> import ptest
    >>>
    1
    2
    3
    4
    5
    6
    >>> import ptest
    ptest is imported
    real bad code should be like this.
    ptest
    >>> import callptest
    callptest
  • Each module has its own private symbol table, which is used as the global symbol table by all functions defined in the module. This avoid name collision.

    1
    2
    3
    4
    5
    6
    7
    >>> def badcode():
    ... print('badcode in main')
    ...
    >>> badcode()
    badcode in main
    >>> ptest.badcode()
    badcode is executed
    1
    2
    3
    >>> ptest.badcode = badcode # it can be referred
    >>> ptest.badcode()
    badcode in main
  • Other variants of import

    • from ModuleName import FunctionName , ItemName

      this operation does not import the ModuleName

    • from ModuleName import *

      this operation does not import names begin with _, it is poorly readable and may mask names defined before, frowned upon.

  • use as to make an alias

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    >>> import ptest as p
    ptest is imported
    real bad code should be like this.
    ptest
    >>> ptest
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    NameError: name 'ptest' is not defined
    >>> p
    <module 'ptest' from '/Users/rubbish/test/ptest.py'>
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    >>> from ptest import badcode as bc
    ptest is imported
    real bad code should be like this.
    ptest
    >>> ptest
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    NameError: name 'ptest' is not defined
    >>> badcode
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    NameError: name 'badcode' is not defined
    >>> bc
    <function badcode at 0x107325ca0>

    6.1.1. Executing modules as scripts

make your modules both executable and importable

1
2
3
4
5
# add this condition
if __name__ == "__main__":
# executable codes below
import sys
fib(int(sys.argv[1]))

6.1.2. The Module Search Path

  1. built-in module

  2. a list of directories given by the variable sys.path. It is initialized by:

    2.1 The directory containing the input script (or the current directory when no file is specified).

    2.2 PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH).

    2.3 The installation-dependent default.

  3. end

  • the directory containing the symlink is NOT added to the module search path.

  • The sys.path list is editable, you can add the path manually.

    1
    2
    3
    4
    5
    6
    7
    >>> import desk
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ModuleNotFoundError: No module named 'desk'
    >>> sys.path.append('/Users/username/Desktop')
    >>> import desk
    Desktop!!!
  • If you run the source file as a script, the directory containing the script being run is placed at the beginning of the search path, ahead of the standard library path. This means that scripts in that directory will be loaded instead of modules of the same name in the library directory. This is an error unless the replacement is intended.

  • The subdirectories are NOT included!!! (If you want to include subdirectories, make it a package)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    /Users/username/test
    ├── mode
    └── ptest.py
    >>> dirs.append('/Users/username/test')
    >>> import ptest
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ModuleNotFoundError: No module named 'ptest'
    >>> dirs.append('/Users/username/test/mode')
    >>> import ptest
    ptest is imported
    real bad code should be like this.
    ptest

    6.1.3. “Compiled” Python files

To speed up loading modules, Python caches the compiled version of each module in the __pycache__ directory under the name module.version.pyc

1
2
$ ls __pycache__
callptest.cpython-39.pyc
  • Python checks the modification date of the source against the compiled version to see if it’s out of date and needs to be recompiled.

  • the compiled modules are platform-independent

  • Python does not check the date of the cache in two circumstances

    • Command-line loaded modules
    • no source module (To support a non-source (compiled only) distribution, the compiled module must be in the source directory, and there must not be a source module.)
  • compiled modules only loaded faster than source file

  • The module compileall can create .pyc files for all modules in a directory.

  • compile specified files

    1
    2
    3
    $ python3 -m py_compile ptest.py
    $ ls __pycache__/
    ptest.cpython-39.pyc
  • compile with optimizations

    1
    2
    3
    $ python3 -O -m py_compile ptest.py
    $ ls __pycache__/
    ptest.cpython-39.opt-1.pyc
    1
    2
    3
    4
    -O     : remove assert and __debug__-dependent statements; add .opt-1 before
    .pyc extension; also PYTHONOPTIMIZE=x
    -OO : do -O changes and also discard docstrings; add .opt-2 before
    .pyc extension
  • see more in PEP 3147.

6.2. Standard Modules

  • The set of such modules is a configuration option which also depends on the underlying platform.(like winreg only available on Windows)

  • sys.ps1andsys.ps2define the primary and the secondary prompt. These two variables are only defined if the interpreter is in interactive mode.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    >>> import sys
    >>> sys.ps1
    '>>> '
    >>> sys.ps2
    '... '
    >>> sys.ps1 = '$$$'
    $$$
    $$$
    $$$
    $$$sys.ps1 = '😀'
    😀
    😀
    😀

    6.3. The dir() Function

  • The built-in function dir() is used to find out which names a module defines. It returns a sorted list of strings.

  • Without arguments, dir() lists the names you have defined currently

1
2
3
4
5
6
7
8
9
10
11
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
>>> import ptest
ptest is imported
real bad code should be like this.
ptest
>>> dir(ptest)
['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'badcode']
>>> a = 1
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'a', 'ptest']
  • Note that it lists all types of names: variables, modules, functions, etc, with built-in things excluded.

  • you can get builtins by builtin

    1
    2
    import builtins
    dir(builtins)

    6.4. Packages

Packages are a way of structuring Python’s module namespace by using “dotted module names”.

Or you can take it as a collection of modules.

  • the use of dotted module names saves the authors of multi-module packages from having to worry about each other’s module names.

  • When importing the package, Python searches through the directories on sys.path looking for the package subdirectory.

  • The __init__.py files are required to make Python treat directories containing the file as packages.

  • __init__.py can just be an empty file, but it can also execute initialization code for the package or set the __all__ variable

  • Users of the package can import individual modules from the package

    1
    2
    3
    import sound.effects.echo # This loads the submodule sound.effects.echo.
    # or
    from sound.effects import echo
  • the item can be either a submodule (or subpackage) of the package, or some other name defined in the package, like a function, class or variable.

  • The import statement first tests whether the item is defined in the package; if not, it assumes it is a module and attempts to load it.

  • when using syntax like import item.subitem.subsubitem, each item except for the last must be a package; the last item can be a module or a package but can’t be a class or function or variable defined in the previous item.

    1
    2
    3
    4
    5
    6
    7
    8
    >>> import mypkg.pkg.greet.tell # wrong
    Hello, I am a package
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    ModuleNotFoundError: No module named 'mypkg.pkg.greet.tell'; 'mypkg.pkg.greet' is not a package
    >>> from mypkg.pkg.greet import tell # right
    >>> tell()
    hello

    6.4.1. Importing * From a Package

if a package’s __init__.py code defines a list named __all__, it is taken to be the list of module names that should be imported when from package import * is encountered.

  • If __all__ is not defined, the statement from sound.effects import * does not import all submodules from the package sound.effects into the current namespace; it only ensures that the package sound.effects has been imported (possibly running any initialization code in __init__.py)
  • then imports whatever names are defined in the package
  • from package import specific_submodule is the recommended way

6.4.2. Intra-package References

use leading dots to make relatively import

1
2
3
from . import ModuleName # from current pkg import a module in the same pkg
from .. import PackageName # from parent dir import another pkg
from ..AnotherPkgName import ModuleName # from parent dir import another module in AnotherPkgName
  • Note that relative imports are based on the name of the current module.
  • modules intended for use as the main module of a Python application must always use absolute imports.

6.4.3. Packages in Multiple Directories

  • __path__ is a special variable, a list containing the name of the directory holding the package’s __init__.py before the code in that file is executed.

  • it can be used to extend the set of modules found in a package.