@@ -50,11 +50,12 @@ URL Parsing
5050The URL parsing functions focus on splitting a URL string into its components,
5151or on combining URL components into a URL string.
5252
53- .. function :: urlparse (urlstring, scheme='' , allow_fragments=True)
53+ .. function :: urlsplit (urlstring, scheme=None , allow_fragments=True)
5454
55- Parse a URL into six components, returning a 6-item :term: `named tuple `. This
56- corresponds to the general structure of a URL:
57- ``scheme://netloc/path;parameters?query#fragment ``.
55+ Parse a URL into five components, returning a 5-item :term: `named tuple `
56+ :class: `SplitResult ` or :class: `SplitResultBytes `.
57+ This corresponds to the general structure of a URL:
58+ ``scheme://netloc/path?query#fragment ``.
5859 Each tuple item is a string, possibly empty. The components are not broken up
5960 into smaller parts (for example, the network location is a single string), and %
6061 escapes are not expanded. The delimiters as shown above are not part of the
@@ -64,15 +65,15 @@ or on combining URL components into a URL string.
6465 .. doctest ::
6566 :options: +NORMALIZE_WHITESPACE
6667
67- >>> from urllib.parse import urlparse
68- >>> urlparse (" scheme://netloc/path;parameters ?query#fragment" )
69- ParseResult (scheme='scheme', netloc='netloc', path='/path;parameters', params=' ',
68+ >>> from urllib.parse import urlsplit
69+ >>> urlsplit (" scheme://netloc/path?query#fragment" )
70+ SplitResult (scheme='scheme', netloc='netloc', path='/path',
7071 query='query', fragment='fragment')
71- >>> o = urlparse (" http://docs.python.org:80/3/library/urllib.parse.html?"
72+ >>> o = urlsplit (" http://docs.python.org:80/3/library/urllib.parse.html?"
7273 ... " highlight=params#url-parsing" )
7374 >>> o
74- ParseResult (scheme='http', netloc='docs.python.org:80',
75- path='/3/library/urllib.parse.html', params='',
75+ SplitResult (scheme='http', netloc='docs.python.org:80',
76+ path='/3/library/urllib.parse.html',
7677 query='highlight=params', fragment='url-parsing')
7778 >>> o.scheme
7879 'http'
@@ -85,23 +86,23 @@ or on combining URL components into a URL string.
8586 >>> o._replace(fragment = " " ).geturl()
8687 'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params'
8788
88- Following the syntax specifications in :rfc: `1808 `, urlparse recognizes
89+ Following the syntax specifications in :rfc: `1808 `, :func: ` !urlsplit ` recognizes
8990 a netloc only if it is properly introduced by '//'. Otherwise the
9091 input is presumed to be a relative URL and thus to start with
9192 a path component.
9293
9394 .. doctest ::
9495 :options: +NORMALIZE_WHITESPACE
9596
96- >>> from urllib.parse import urlparse
97- >>> urlparse (' //www.cwi.nl:80/%7E guido/Python.html' )
98- ParseResult (scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
99- params='', query='', fragment='')
100- >>> urlparse (' www.cwi.nl/%7E guido/Python.html' )
101- ParseResult (scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
102- params='', query='', fragment='')
103- >>> urlparse (' help/Python.html' )
104- ParseResult (scheme='', netloc='', path='help/Python.html', params=' ',
97+ >>> from urllib.parse import urlsplit
98+ >>> urlsplit (' //www.cwi.nl:80/%7E guido/Python.html' )
99+ SplitResult (scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
100+ query='', fragment='')
101+ >>> urlsplit (' www.cwi.nl/%7E guido/Python.html' )
102+ SplitResult (scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
103+ query='', fragment='')
104+ >>> urlsplit (' help/Python.html' )
105+ SplitResult (scheme='', netloc='', path='help/Python.html',
105106 query='', fragment='')
106107
107108 The *scheme * argument gives the default addressing scheme, to be
@@ -126,12 +127,9 @@ or on combining URL components into a URL string.
126127 +------------------+-------+-------------------------+------------------------+
127128 | :attr: `path ` | 2 | Hierarchical path | empty string |
128129 +------------------+-------+-------------------------+------------------------+
129- | :attr: `params ` | 3 | Parameters for last | empty string |
130- | | | path element | |
131- +------------------+-------+-------------------------+------------------------+
132- | :attr: `query ` | 4 | Query component | empty string |
130+ | :attr: `query ` | 3 | Query component | empty string |
133131 +------------------+-------+-------------------------+------------------------+
134- | :attr: `fragment ` | 5 | Fragment identifier | empty string |
132+ | :attr: `fragment ` | 4 | Fragment identifier | empty string |
135133 +------------------+-------+-------------------------+------------------------+
136134 | :attr: `username ` | | User name | :const: `None ` |
137135 +------------------+-------+-------------------------+------------------------+
@@ -155,26 +153,30 @@ or on combining URL components into a URL string.
155153 ``# ``, ``@ ``, or ``: `` will raise a :exc: `ValueError `. If the URL is
156154 decomposed before parsing, no error will be raised.
157155
156+ Following some of the `WHATWG spec `_ that updates :rfc: `3986 `, leading C0
157+ control and space characters are stripped from the URL. ``\n ``,
158+ ``\r `` and tab ``\t `` characters are removed from the URL at any position.
159+
158160 As is the case with all named tuples, the subclass has a few additional methods
159161 and attributes that are particularly useful. One such method is :meth: `_replace `.
160- The :meth: `_replace ` method will return a new ParseResult object replacing specified
161- fields with new values.
162+ The :meth: `_replace ` method will return a new :class: ` SplitResult ` object
163+ replacing specified fields with new values.
162164
163165 .. doctest ::
164166 :options: +NORMALIZE_WHITESPACE
165167
166- >>> from urllib.parse import urlparse
167- >>> u = urlparse (' //www.cwi.nl:80/%7E guido/Python.html' )
168+ >>> from urllib.parse import urlsplit
169+ >>> u = urlsplit (' //www.cwi.nl:80/%7E guido/Python.html' )
168170 >>> u
169- ParseResult (scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
170- params='', query='', fragment='')
171+ SplitResult (scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
172+ query='', fragment='')
171173 >>> u._replace(scheme = ' http' )
172- ParseResult (scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
173- params='', query='', fragment='')
174+ SplitResult (scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
175+ query='', fragment='')
174176
175177 .. warning ::
176178
177- :func: `urlparse ` does not perform validation. See :ref: `URL parsing
179+ :func: `urlsplit ` does not perform validation. See :ref: `URL parsing
178180 security <url-parsing-security>` for details.
179181
180182 .. versionchanged :: 3.2
@@ -193,6 +195,14 @@ or on combining URL components into a URL string.
193195 Characters that affect netloc parsing under NFKC normalization will
194196 now raise :exc: `ValueError `.
195197
198+ .. versionchanged :: 3.10
199+ ASCII newline and tab characters are stripped from the URL.
200+
201+ .. versionchanged :: 3.12
202+ Leading WHATWG C0 control and space characters are stripped from the URL.
203+
204+ .. _WHATWG spec : https://url.spec.whatwg.org/#concept-basic-url-parser
205+
196206
197207.. function :: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
198208
@@ -283,93 +293,35 @@ or on combining URL components into a URL string.
283293 separator key, with ``& `` as the default separator.
284294
285295
286- .. function :: urlunparse (parts)
296+ .. function :: urlunsplit (parts)
287297
288- Construct a URL from a tuple as returned by ``urlparse () ``. The *parts *
289- argument can be any six -item iterable. This may result in a slightly
298+ Construct a URL from a tuple as returned by ``urlsplit () ``. The *parts *
299+ argument can be any five -item iterable. This may result in a slightly
290300 different, but equivalent URL, if the URL that was parsed originally had
291301 unnecessary delimiters (for example, a ``? `` with an empty query; the RFC
292302 states that these are equivalent).
293303
294304
295- .. function :: urlsplit(urlstring, scheme='', allow_fragments=True)
296-
297- This is similar to :func: `urlparse `, but does not split the params from the URL.
298- This should generally be used instead of :func: `urlparse ` if the more recent URL
299- syntax allowing parameters to be applied to each segment of the *path * portion
300- of the URL (see :rfc: `2396 `) is wanted. A separate function is needed to
301- separate the path segments and parameters. This function returns a 5-item
302- :term: `named tuple `::
303-
304- (addressing scheme, network location, path, query, fragment identifier).
305-
306- The return value is a :term: `named tuple `, its items can be accessed by index
307- or as named attributes:
308-
309- +------------------+-------+-------------------------+----------------------+
310- | Attribute | Index | Value | Value if not present |
311- +==================+=======+=========================+======================+
312- | :attr: `scheme ` | 0 | URL scheme specifier | *scheme * parameter |
313- +------------------+-------+-------------------------+----------------------+
314- | :attr: `netloc ` | 1 | Network location part | empty string |
315- +------------------+-------+-------------------------+----------------------+
316- | :attr: `path ` | 2 | Hierarchical path | empty string |
317- +------------------+-------+-------------------------+----------------------+
318- | :attr: `query ` | 3 | Query component | empty string |
319- +------------------+-------+-------------------------+----------------------+
320- | :attr: `fragment ` | 4 | Fragment identifier | empty string |
321- +------------------+-------+-------------------------+----------------------+
322- | :attr: `username ` | | User name | :const: `None ` |
323- +------------------+-------+-------------------------+----------------------+
324- | :attr: `password ` | | Password | :const: `None ` |
325- +------------------+-------+-------------------------+----------------------+
326- | :attr: `hostname ` | | Host name (lower case) | :const: `None ` |
327- +------------------+-------+-------------------------+----------------------+
328- | :attr: `port ` | | Port number as integer, | :const: `None ` |
329- | | | if present | |
330- +------------------+-------+-------------------------+----------------------+
331-
332- Reading the :attr: `port ` attribute will raise a :exc: `ValueError ` if
333- an invalid port is specified in the URL. See section
334- :ref: `urlparse-result-object ` for more information on the result object.
335-
336- Unmatched square brackets in the :attr: `netloc ` attribute will raise a
337- :exc: `ValueError `.
338-
339- Characters in the :attr: `netloc ` attribute that decompose under NFKC
340- normalization (as used by the IDNA encoding) into any of ``/ ``, ``? ``,
341- ``# ``, ``@ ``, or ``: `` will raise a :exc: `ValueError `. If the URL is
342- decomposed before parsing, no error will be raised.
343-
344- Following some of the `WHATWG spec `_ that updates RFC 3986, leading C0
345- control and space characters are stripped from the URL. ``\n ``,
346- ``\r `` and tab ``\t `` characters are removed from the URL at any position.
347-
348- .. warning ::
349-
350- :func: `urlsplit ` does not perform validation. See :ref: `URL parsing
351- security <url-parsing-security>` for details.
305+ .. function :: urlparse(urlstring, scheme=None, allow_fragments=True)
352306
353- .. versionchanged :: 3.6
354- Out-of-range port numbers now raise :exc: `ValueError `, instead of
355- returning :const: `None `.
307+ This is similar to :func: `urlsplit `, but additionally splits the *path *
308+ component on *path * and *params *.
309+ This function returns a 6-item :term: `named tuple ` :class: `ParseResult `
310+ or :class: `ParseResultBytes `.
311+ Its items are the same as for the :func: `!urlsplit ` result, except that
312+ *params * is inserted at index 3, between *path * and *query *.
356313
357- .. versionchanged :: 3.8
358- Characters that affect netloc parsing under NFKC normalization will
359- now raise :exc: `ValueError `.
314+ This function is based on obsoleted :rfc: `1738 ` and :rfc: `1808 `, which
315+ listed *params * as the main URL component.
316+ The more recent URL syntax allows parameters to be applied to each segment
317+ of the *path * portion of the URL (see :rfc: `3986 `).
318+ :func: `urlsplit ` should generally be used instead of :func: `urlparse `.
319+ A separate function is needed to separate the path segments and parameters.
360320
361- .. versionchanged :: 3.10
362- ASCII newline and tab characters are stripped from the URL.
363-
364- .. versionchanged :: 3.12
365- Leading WHATWG C0 control and space characters are stripped from the URL.
366-
367- .. _WHATWG spec : https://url.spec.whatwg.org/#concept-basic-url-parser
368-
369- .. function :: urlunsplit(parts)
321+ .. function :: urlunparse(parts)
370322
371- Combine the elements of a tuple as returned by :func: `urlsplit ` into a
372- complete URL as a string. The *parts * argument can be any five -item
323+ Combine the elements of a tuple as returned by :func: `urlparse ` into a
324+ complete URL as a string. The *parts * argument can be any six -item
373325 iterable. This may result in a slightly different, but equivalent URL, if the
374326 URL that was parsed originally had unnecessary delimiters (for example, a ?
375327 with an empty query; the RFC states that these are equivalent).
@@ -387,7 +339,7 @@ or on combining URL components into a URL string.
387339 'http://www.cwi.nl/%7Eguido/FAQ.html'
388340
389341 The *allow_fragments * argument has the same meaning and default as for
390- :func: `urlparse `.
342+ :func: `urlsplit `.
391343
392344 .. note ::
393345
@@ -527,7 +479,7 @@ individual URL quoting functions.
527479Structured Parse Results
528480------------------------
529481
530- The result objects from the :func: `urlparse `, :func: `urlsplit ` and
482+ The result objects from the :func: `urlsplit `, :func: `urlparse ` and
531483:func: `urldefrag ` functions are subclasses of the :class: `tuple ` type.
532484These subclasses add the attributes listed in the documentation for
533485those functions, the encoding and decoding support described in the
0 commit comments