xref: /curl/docs/libcurl/curl_url_get.md (revision 5a488251)
1---
2c: Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
3SPDX-License-Identifier: curl
4Title: curl_url_get
5Section: 3
6Source: libcurl
7See-also:
8  - CURLOPT_CURLU (3)
9  - curl_url (3)
10  - curl_url_cleanup (3)
11  - curl_url_dup (3)
12  - curl_url_set (3)
13  - curl_url_strerror (3)
14Protocol:
15  - All
16Added-in: 7.62.0
17---
18
19# NAME
20
21curl_url_get - extract a part from a URL
22
23# SYNOPSIS
24
25~~~c
26#include <curl/curl.h>
27
28CURLUcode curl_url_get(const CURLU *url,
29                       CURLUPart part,
30                       char **content,
31                       unsigned int flags);
32~~~
33
34# DESCRIPTION
35
36Given a *url* handle of a URL object, this function extracts an individual
37piece or the full URL from it.
38
39The *part* argument specifies which part to extract (see list below) and
40*content* points to a 'char *' to get updated to point to a newly
41allocated string with the contents.
42
43The *flags* argument is a bitmask with individual features.
44
45The returned content pointer must be freed with curl_free(3) after use.
46
47# FLAGS
48
49The flags argument is zero, one or more bits set in a bitmask.
50
51## CURLU_DEFAULT_PORT
52
53If the handle has no port stored, this option makes curl_url_get(3)
54return the default port for the used scheme.
55
56## CURLU_DEFAULT_SCHEME
57
58If the handle has no scheme stored, this option makes curl_url_get(3)
59return the default scheme instead of error.
60
61## CURLU_NO_DEFAULT_PORT
62
63Instructs curl_url_get(3) to not return a port number if it matches the
64default port for the scheme.
65
66## CURLU_URLDECODE
67
68Asks curl_url_get(3) to URL decode the contents before returning it. It
69does not decode the scheme, the port number or the full URL.
70
71The query component also gets plus-to-space conversion as a bonus when this
72bit is set.
73
74Note that this URL decoding is charset unaware and you get a zero terminated
75string back with data that could be intended for a particular encoding.
76
77If there are byte values lower than 32 in the decoded string, the get
78operation returns an error instead.
79
80## CURLU_URLENCODE
81
82If set, curl_url_get(3) URL encodes the hostname part when a full URL is
83retrieved. If not set (default), libcurl returns the URL with the hostname raw
84to support IDN names to appear as-is. IDN hostnames are typically using
85non-ASCII bytes that otherwise gets percent-encoded.
86
87Note that even when not asking for URL encoding, the '%' (byte 37) is URL
88encoded to make sure the hostname remains valid.
89
90## CURLU_PUNYCODE
91
92If set and *CURLU_URLENCODE* is not set, and asked to retrieve the
93**CURLUPART_HOST** or **CURLUPART_URL** parts, libcurl returns the host
94name in its punycode version if it contains any non-ASCII octets (and is an
95IDN name).
96
97If libcurl is built without IDN capabilities, using this bit makes
98curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname contains
99anything outside the ASCII range.
100
101(Added in curl 7.88.0)
102
103## CURLU_PUNY2IDN
104
105If set and asked to retrieve the **CURLUPART_HOST** or **CURLUPART_URL**
106parts, libcurl returns the hostname in its IDN (International Domain Name)
107UTF-8 version if it otherwise is a punycode version. If the punycode name
108cannot be converted to IDN correctly, libcurl returns
109*CURLUE_BAD_HOSTNAME*.
110
111If libcurl is built without IDN capabilities, using this bit makes
112curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname is using
113punycode.
114
115(Added in curl 8.3.0)
116
117## CURLU_GET_EMPTY
118
119When this flag is used in curl_url_get(), it makes the function return empty
120query and fragments parts or when used in the full URL. By default, libcurl
121otherwise considers empty parts non-existing.
122
123An empty query part is one where this is nothing following the question mark
124(before the possible fragment). An empty fragments part is one where there is
125nothing following the hash sign.
126
127(Added in curl 8.8.0)
128
129## CURLU_NO_GUESS_SCHEME
130
131When this flag is used in curl_url_get(), it treats the scheme as non-existing
132if it was set as a result of a previous guess; when CURLU_GUESS_SCHEME was
133used parsing a URL.
134
135Using this flag when getting CURLUPART_SCHEME if the scheme was set as the
136result of a guess makes curl_url_get() return CURLUE_NO_SCHEME.
137
138Using this flag when getting CURLUPART_URL if the scheme was set as the result
139of a guess makes curl_url_get() return the full URL without the scheme
140component. Such a URL can then only be parsed with curl_url_set() if
141CURLU_GUESS_SCHEME is used.
142
143(Added in curl 8.9.0)
144
145# PARTS
146
147## CURLUPART_URL
148
149When asked to return the full URL, curl_url_get(3) returns a normalized and
150possibly cleaned up version using all available URL parts.
151
152We advise using the *CURLU_PUNYCODE* option to get the URL as "normalized" as
153possible since IDN allows hostnames to be written in many different ways that
154still end up the same punycode version.
155
156Zero-length queries and fragments are excluded from the URL unless
157CURLU_GET_EMPTY is set.
158
159## CURLUPART_SCHEME
160
161Scheme cannot be URL decoded on get.
162
163## CURLUPART_USER
164
165## CURLUPART_PASSWORD
166
167## CURLUPART_OPTIONS
168
169The options field is an optional field that might follow the password in the
170userinfo part. It is only recognized/used when parsing URLs for the following
171schemes: pop3, smtp and imap. The URL API still allows users to set and get
172this field independently of scheme when not parsing full URLs.
173
174## CURLUPART_HOST
175
176The hostname. If it is an IPv6 numeric address, the zone id is not part of it
177but is provided separately in *CURLUPART_ZONEID*. IPv6 numerical addresses
178are returned within brackets ([]).
179
180IPv6 names are normalized when set, which should make them as short as
181possible while maintaining correct syntax.
182
183## CURLUPART_ZONEID
184
185If the hostname is a numeric IPv6 address, this field might also be set.
186
187## CURLUPART_PORT
188
189A port cannot be URL decoded on get. This number is returned in a string just
190like all other parts. That string is guaranteed to hold a valid port number in
191ASCII using base 10.
192
193## CURLUPART_PATH
194
195The *part* is always at least a slash ('/') even if no path was supplied
196in the URL. A URL path always starts with a slash.
197
198## CURLUPART_QUERY
199
200The initial question mark that denotes the beginning of the query part is a
201delimiter only. It is not part of the query contents.
202
203A not-present query returns *part* set to NULL.
204
205A zero-length query returns *part* as NULL unless CURLU_GET_EMPTY is set.
206
207The query part gets pluses converted to space when asked to URL decode on get
208with the CURLU_URLDECODE bit.
209
210## CURLUPART_FRAGMENT
211
212The initial hash sign that denotes the beginning of the fragment is a
213delimiter only. It is not part of the fragment contents.
214
215A not-present fragment returns *part* set to NULL.
216
217A zero-length fragment returns *part* as NULL unless CURLU_GET_EMPTY is set.
218
219# %PROTOCOLS%
220
221# EXAMPLE
222
223~~~c
224int main(void)
225{
226  CURLUcode rc;
227  CURLU *url = curl_url();
228  rc = curl_url_set(url, CURLUPART_URL, "https://example.com", 0);
229  if(!rc) {
230    char *scheme;
231    rc = curl_url_get(url, CURLUPART_SCHEME, &scheme, 0);
232    if(!rc) {
233      printf("the scheme is %s\n", scheme);
234      curl_free(scheme);
235    }
236    curl_url_cleanup(url);
237  }
238}
239~~~
240
241# %AVAILABILITY%
242
243# RETURN VALUE
244
245Returns a CURLUcode error value, which is CURLUE_OK (0) if everything went
246fine. See the libcurl-errors(3) man page for the full list with
247descriptions.
248
249If this function returns an error, no URL part is returned.
250