-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path06-strings.tex
More file actions
997 lines (815 loc) · 29.9 KB
/
Copy path06-strings.tex
File metadata and controls
997 lines (815 loc) · 29.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
% LaTeX source for ``Introduction to Computer Science (Python Edition)''
% Copyright (c) 2021- David S. Read, All Rights Reserved
% Built upon, ``Python for Informatics: Exploring Information''
% Copyright (c) 2010- Charles R. Severance, All Rights Reserved
\chapter{Strings}
\label{strings}
\section{A string is a sequence}
\index{sequence}
\index{character}
\index{bracket operator}
\index{operator!bracket}
A string is a {\bf sequence} of characters.
You can access the characters one at a time with the
bracket operator:
\beforeverb
\begin{verbatim}
>>> fruit = 'banana'
>>> letter = fruit[1]
\end{verbatim}
\afterverb
%
\index{index}
The second statement extracts the character at index position 1 from the
{\tt fruit} variable and assigns it to the {\tt letter} variable.
The expression in brackets is called an {\bf index}.
The index indicates which character in the sequence you
want (hence the name).
But you might not get what you expect:
\beforeverb
\begin{verbatim}
>>> print(letter)
a
\end{verbatim}
\afterverb
%
For most people, the first letter of \verb"'banana'" is {\tt b}, not
{\tt a}. But in Python, the index is an offset from the
beginning of the string, and the {\bf offset of the first letter is zero}.
\beforeverb
\begin{verbatim}
>>> letter = fruit[0]
>>> print(letter)
b
\end{verbatim}
\afterverb
%
So {\tt b} is the 0th letter (``zero-eth'') of \verb"'banana'", {\tt a}
is the 1th letter (``one-eth''), and {\tt n} is the 2th (``two-eth'')
letter.
\beforefig
\centerline{\includegraphics[height=0.50in]{figs2/string.eps}}
\afterfig
\index{index!starting at zero}
\index{zero, index starting at}
You can use any expression, including variables and operators, as an
index, but the value of the index has to be an integer. Otherwise you
get:
\index{index}
\index{exception!TypeError}
\index{TypeError}
\beforeverb
\begin{verbatim}
>>> letter = fruit[1.5]
TypeError: string indices must be integers
\end{verbatim}
\afterverb
%
\section{Getting the length of a string using {\tt len}}
\index{len function}
\index{function!len}
{\tt len} is a built-in function that returns the number of characters
in a string:
\beforeverb
\begin{verbatim}
>>> fruit = 'banana'
>>> len(fruit)
6
\end{verbatim}
\afterverb
%
To get the last letter of a string, you might be tempted to try something
like this:
\index{exception!IndexError}
\index{IndexError}
\beforeverb
\begin{verbatim}
>>> length = len(fruit)
>>> last = fruit[length]
IndexError: string index out of range
\end{verbatim}
\afterverb
%
The reason for the {\tt IndexError} is that there is no letter in {\tt
'banana'} with the index 6. Since we started counting at zero, the
six letters are numbered 0 to 5. To get the last character, you have
to subtract 1 from {\tt length}:
\beforeverb
\begin{verbatim}
>>> last = fruit[length-1]
>>> print(last)
a
\end{verbatim}
\afterverb
%
Alternatively, you can use negative indices, which count backward from
the end of the string. The expression {\tt fruit[-1]} yields the last
letter, {\tt fruit[-2]} yields the second to last, and so on.
\index{index!negative}
\index{negative index}
\section{Traversal through a string with a loop}
\label{for}
\index{traversal}
\index{loop!traversal}
\index{for loop}
\index{loop!for}
\index{statement!for}
\index{traversal}
\index{while loop}
\index{loop!while}
\index{statement!while}
A lot of computations involve processing a string one character at a
time. Often they start at the beginning, select each character in
turn, do something to it, and continue until the end. This pattern of
processing is called a {\bf traversal}.
\label{ch6StringTraversal}
One way to write a traversal
is with a {\tt while} loop:
\beforeverb
\begin{verbatim}
index = 0
while index < len(fruit):
letter = fruit[index]
print(letter)
index = index + 1
\end{verbatim}
\afterverb
%
This loop traverses the string and displays each letter on a line by
itself. The loop condition is {\tt index < len(fruit)}, so
when {\tt index} is equal to the length of the string, the
condition is false, and the body of the loop is not executed. The
last character accessed is the one with the index {\tt len(fruit)-1},
which is the last character in the string.
Another way to write a traversal is with a {\tt for} loop:
\beforeverb
\begin{verbatim}
for char in fruit:
print(char)
\end{verbatim}
\afterverb
%
Each time through the loop, the next character in the string is assigned
to the variable {\tt char}. The loop continues until no characters are
left.
\section{String slices}
\label{slice}
\index{slice operator}
\index{operator!slice}
\index{index!slice}
\index{string!slice}
\index{slice!string}
A segment of a string is called a {\bf slice} or {\bf substring}. Selecting a slice is
similar to selecting a character:
\beforeverb
\begin{verbatim}
>>> s = 'Monty Python'
>>> print(s[0:5])
Monty
>>> print(s[6:12])
Python
\end{verbatim}
\afterverb
%
The operator {\tt [n:m]} returns the part of the string from the
``n-eth'' character to the ``m-eth'' character, including the first {\bf but
excluding the last}.
If you omit the first index (before the colon), the slice starts at
the beginning of the string. If you omit the second index, the slice
goes to the end of the string:
\beforeverb
\begin{verbatim}
>>> fruit = 'banana'
>>> fruit[:3]
'ban'
>>> fruit[3:]
'ana'
\end{verbatim}
\afterverb
%
If the first index is greater than or equal to the second the result
is an {\bf empty string}, represented by two quotation marks:
\index{quotation mark}
\beforeverb
\begin{verbatim}
>>> fruit = 'banana'
>>> fruit[3:3]
''
\end{verbatim}
\afterverb
%
An empty string contains no characters and has length 0, but other
than that, it is the same as any other string.
\index{copy!slice}
\index{slice!copy}
When you take a slice from a string, the slice is a new string. It contains a copy of the characters from the range of the slice.
The slice (or substring) convention of including the first index character and excluding the last is common when working with strings in many languages. There are two advantages to this approach. First, you may use the length of the string as the second value in the slice to get the last character in the string. Second, if you subtract the starting index from the ending index the result is the number of characters placed in the slice.
\section{Strings are immutable}
\index{mutability}
\index{immutability}
\index{string!immutable}
It is tempting to use the {\tt []} operator on the left side of an
assignment, with the intention of changing a character in a string.
For example:
\index{TypeError}
\index{exception!TypeError}
\beforeverb
\begin{verbatim}
>>> greeting = 'Hello, world!'
>>> greeting[0] = 'J'
TypeError: object does not support item assignment
\end{verbatim}
\afterverb
%
The ``object'' in this case is the string and the ``item'' is
the character you tried to assign. For now, an {\bf object} is
the same thing as a value, but we will refine that definition
later. An {\bf item} is one of the values in a sequence.
\index{object}
\index{item assignment}
\index{assignment!item}
\index{immutability}
\index{concatenate}
The reason for the error is that
strings are {\bf immutable}, which means you can't change an
existing string. The best you can do is create a new string
that is a variation on the original:
\beforeverb
\begin{verbatim}
>>> greeting = 'Hello, world!'
>>> new_greeting = 'J' + greeting[1:]
>>> print(new_greeting)
Jello, world!
\end{verbatim}
\afterverb
%
This example concatenates a new first letter onto
a slice of {\tt greeting}. It has no effect on
the original string.
\index{concatenation}
\section{Looping and counting}
\label{counter}
\index{counter}
\index{counting and looping}
\index{looping and counting}
\index{looping!with strings}
\label{ch6LetterCount}
The following program counts the number of times the letter {\tt a}
appears in a string:
\beforeverb
\begin{verbatim}
word = 'banana'
count = 0
for letter in word:
if letter == 'a':
count = count + 1
print(count)
\end{verbatim}
\afterverb
%
This program demonstrates another pattern of computation called a {\bf
counter}. The variable {\tt count} is initialized to 0 and then
incremented each time an {\tt a} is found.
When the loop exits, {\tt count}
contains the result---the total number of {\tt a}'s.
\section{The {\tt in} operator}
\label{inboth}
\index{in operator}
\index{operator!in}
\index{boolean operator}
\index{operator!boolean}
The word {\tt in} is a boolean operator that takes two strings and
returns {\tt True} if the first appears as a substring in the second:
\beforeverb
\begin{verbatim}
>>> 'a' in 'banana'
True
>>> 'seed' in 'banana'
False
\end{verbatim}
\afterverb
%
\section{String comparison}
\index{string!comparison}
\index{comparison!string}
The comparison operators work on strings. To see if two strings are equal:
\beforeverb
\begin{verbatim}
if word == 'banana':
print('All right, bananas.')
\end{verbatim}
\afterverb
%
Other comparison operations are useful for putting words in alphabetical
order:
\beforeverb
\begin{verbatim}
if word < 'banana':
print('Your word,' + word + ', comes before banana.')
elif word > 'banana':
print('Your word,' + word + ', comes after banana.')
else:
print('All right, bananas.')
\end{verbatim}
\afterverb
%
Python does not handle uppercase and lowercase letters the same way
that people do. All the uppercase letters come before all the
lowercase letters, so:
\beforeverb
\begin{verbatim}
Your word, Pineapple, comes before banana.
\end{verbatim}
\afterverb
%
A common way to address this problem is to convert strings to a
standard format, such as all lowercase, before performing the
comparison. Keep that in mind in case you have to defend yourself
against a man armed with a Pineapple.
To understand how we could convert a string to lowercase characters we need to delve into accessing a string's methods.
\section{{\tt string} methods}
Strings are an example of Python {\bf objects}. An object is also known as an {\bf instance}. Each object contains
data (in this case the actual string itself) and has access to {\bf methods}, which
are effectively functions that are built for each instance to use. The data and methods are defined by the type\footnote{We will spend a great deal of time later in the semester creating our own types and objects}.
We can list the methods available to an object (instance) using Python's {\tt dir} function, which lists the methods available
for an object. We've already seen that the {\tt type} function shows the type of an object. We can use the {\tt help} function to read documentation for a method, if the programmer of the method included such documentation.
\beforeverb
\begin{verbatim}
>>> stuff = 'Hello world'
>>> type(stuff)
<type 'str'>
>>> dir(stuff)
['capitalize', 'center', 'count', 'decode', 'encode',
'endswith', 'expandtabs', 'find', 'format', 'index',
'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace',
'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip',
'partition', 'replace', 'rfind', 'rindex', 'rjust',
'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines',
'startswith', 'strip', 'swapcase', 'title', 'translate',
'upper', 'zfill']
>>> help(str.capitalize)
Help on method_descriptor:
capitalize(...)
S.capitalize() -> string
Return a copy of the string S with only its first character
capitalized.
>>>
\end{verbatim}
\afterverb
%
While the {\tt dir} function lists the methods, and you
can use {\tt help} to get some simple documentation on a method,
a better source of documentation for string methods would be
\url{https://docs.python.org/3/library/stdtypes.html#string-methods}.
Calling a {\bf method} is similar to calling a function---it
takes arguments and returns a value---but the syntax is different.
We call a method by appending the method name to the variable name
using the period as a delimiter.
For example, the
method {\tt upper} returns a new string with
all uppercase letters using the string it is run on as the original value:
\index{method}
\index{string!method}
Instead of the function syntax {\tt upper(word)}, it uses
the method syntax {\tt word.upper()}.
\index{dot notation}
\beforeverb
\begin{verbatim}
>>> word = 'banana'
>>> new_word = word.upper()
>>> print(new_word)
BANANA
\end{verbatim}
\afterverb
%
This form of dot notation specifies the name of the method, {\tt
upper}, and the name of the string to apply the method to, {\tt
word}. The empty parentheses indicate that this method takes no
argument.
\index{parentheses!empty}
A method call is called an {\bf invocation}; in this case, we would
say that we are invoking {\tt upper} on the {\tt word} object (instance).
\index{invocation}
For example, there is a string method named {\tt find} that
searches for the position of one string within another:
\beforeverb
\begin{verbatim}
>>> word = 'banana'
>>> index = word.find('a')
>>> print(index)
1
\end{verbatim}
\afterverb
%
In this example, we invoke {\tt find} on {\tt word} and pass
the letter we are looking for as a parameter.
The {\tt find} method can find substrings as well as characters:
\beforeverb
\begin{verbatim}
>>> word.find('na')
2
\end{verbatim}
\afterverb
%
It can take as a second argument the index where it should start:
\index{optional argument}
\index{argument!optional}
\beforeverb
\begin{verbatim}
>>> word.find('na', 3)
4
\end{verbatim}
\afterverb
%
One common task is to remove white space (spaces, tabs, or newlines) from
the beginning and end of a string using the {\tt strip} method:
\beforeverb
\begin{verbatim}
>>> line = ' Here we go '
>>> line.strip()
'Here we go'
\end{verbatim}
\afterverb
%
Some methods such as {\bf startswith} return boolean values.
\beforeverb
\begin{verbatim}
>>> line = 'Please have a nice day'
>>> line.startswith('Please')
True
>>> line.startswith('p')
False
\end{verbatim}
\afterverb
%
\label{ch6StartsWithExample}
You will note that {\tt startswith} requires case to match, so sometimes
we take a line and map it all to lowercase before we do any checking
using the {\tt lower} method.
\beforeverb
\begin{verbatim}
>>> line = 'Please have a nice day'
>>> line.startswith('p')
False
>>> line.lower()
'please have a nice day'
>>> line.lower().startswith('p')
True
\end{verbatim}
\afterverb
%
In the last example, the method {\tt lower} is called
and then we use {\tt startswith}
to see if the resulting lowercase string
starts with the letter ``p''. As long as we are careful
with the order, we can make multiple method calls in a
single expression.
\section{Parsing strings}
Often, we want to look into a string and find a substring. For example
if we were presented a series of lines formatted as follows:
\beforeverb
\begin{alltt}
From stephen.marquard@{\bf uct.ac.za} Sat Jan 5 09:14:16 2008
\end{alltt}
\afterverb
and we wanted to pull out only the second half of the address (i.e.,
{\tt uct.ac.za}) from each line, we can do this by using the {\tt find}
method and string slicing.
First, we will find the position of the at-sign (@) in the string. Then we will
find the position of the first space \emph{after} the at-sign. And then we
will use string slicing to extract the portion of the string which we
are looking for.
\beforeverb
\begin{verbatim}
>>> data = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008'
>>> atpos = data.find('@')
>>> print(atpos)
21
>>> sppos = data.find(' ',atpos)
>>> print(sppos)
31
>>> host = data[atpos+1:sppos]
>>> print(host)
uct.ac.za
>>>
\end{verbatim}
\afterverb
%
We use a version of the {\tt find} method which allows us to specify
a position in the string where we want {\tt find} to start looking.
When we slice, we extract the characters
from ``one beyond the at-sign through up to \emph{but not including} the
space character''.
The documentation for the {\tt find} method is available at
\url{https://docs.python.org/3/library/stdtypes.html#string-methods}.
\section{Format strings}
\index{format strings}
\index{strings!format}
A {\bf format string} contains text and {\bf replacement fields}, which are delineated with curly braces, \{\}. Format strings allow us to create strings that combine text with other values, including literals and variables.
As an example, if we were writing a game and wanted to display the current score we might display the message:
\beforeverb
\begin{verbatim}
Current score is 5
\end{verbatim}
\afterverb
However, the score value, 5 as shown above, can't be part of the hardcoded string's text. Instead it is probably being stored in a variable, since it can change as the user plays the game. The current score needs to be included in the string where the 5 is shown.
That's where a format string becomes useful, allowing us to combine text with replacement fields. Here is a format string that could be used for the score example. It starts with the text {\tt Current score is} and then includes a replacement field to specify that some value should be inserted at that point in the string:
\beforeverb
\begin{verbatim}
'Current score is {}'
\end{verbatim}
\afterverb
Later, we'll see that we can include values in the replacement field to control the formatting of data. If the braces are left empty, {\tt \{\}}, the default representation for the data will be used. In all cases, the result is a string.
We call the {\tt format} method to provide the values used in place of the replacement fields in the format string. Let's put the score example together:
\beforeverb
\begin{verbatim}
>>> score = 6
>>> message = 'Current score is {}'.format(score)
>>> print(message)
Current score is 6
\end{verbatim}
\afterverb
The {\tt message} variable is assigned a value using a format string. The resulting value is \verb"'Current score is 6'", where the {\tt 6} is the {\tt score} variable's value being passed to the {\tt format} method. If you think about all the programs that you use and all the ways they show you information, mixing data with text, this technique becomes very handy.
We can place more than one replacement field in a string. Here is a basic example which uses two variables' values as part of the string being printed:
\beforeverb
\begin{verbatim}
>>> camels = 42
>>> elk = 25
>>> message = 'There are {} camels and {} elk.'.format(camels, elk)
>>> print(message)
There are 42 camels and 25 elk.
\end{verbatim}
\afterverb
%
Again, the {\tt message} variable is assigned a value using a format string. The resulting value is \verb"'There are 42 camels and 25 elk.'", where {\tt 42} and {\tt 25} are values taken from the variables passed to the {\tt format} method.
Replacement fields may appear anywhere in a format string,
so you can embed values anywhere in a sentence. Here is an example using three replacement fields, one at the begining, one near the middle, and one at the very end:
\beforeverb
\begin{verbatim}
>>> label_id = 1
>>> camels = 42
>>> mood = 'happy'
>>> print('{}: The {} camels are {}'.format(label_id, camels, mood))
1: The 42 camels are happy
\end{verbatim}
\afterverb
%
Values passed to the {\tt format} method may be any sort of data, such as integers, floats, and strings. By default, for each replacement field in the string, there must be a variable passed to {\tt format}. The values are used in left to right order.
You can control the formatting of data as it is included in the string. For instance, you can limit the number of decimal places displayed for a number or set the value to be left or right justified. Formatting is controlled by adding {\bf format specifications} within the curly braces of the replacement field.
Let's demonstrate some formatting capabilities. We'll begin by formatting numeric values, limiting the width of a value and the number of decimal places displayed.
To set the desired width (number of characters occupied by the number) and the number of decimal places, we define a format specification starting with a colon, followed by the total number of characters to display (width), a dot, the number of digits after the decimal point to display, and the letter {\bf f}, indicating a floating point value.
Our next examples will set the number of decimal places without limiting the width (we'll place no number before the dot in the format specification of the replacement field). Look carefully at the output and be sure you understand what is happening.
\beforeverb
\begin{verbatim}
>>> print('{:.0f}'.format(3000.14159))
3000
>>> print('{:.1f}'.format(3000.14159))
3000.1
>>> print('{:.2f}'.format(3000.14159))
3000.14
>>> print('{:.5f}'.format(3000.14159))
3000.14159
>>> print('{:.10f}'.format(3000.14159))
3000.1415900000
>>>
\end{verbatim}
\afterverb
%
Now let's try setting the width as well. \emph{Note that the width is a minimum.} If the whole number portion of a value is too large for the width, it will not be truncated. Instead, the width will be increased to include all the digits to the left of the decimal point.
\beforeverb
\begin{verbatim}
>>> print('{:1.0f}'.format(3000.14159))
3000
>>> print('{:4.0f}'.format(3000.14159))
3000
>>> print('{:10.0f}'.format(3000.14159))
3000
>>> print('{:10.2f}'.format(3000.14159))
3000.14
\end{verbatim}
\afterverb
%
{\bf Why do the last two examples start with empty spaces?} Hint: the third printed value contains 6 spaces before the 3000 and the fourth printed value contains 3 spaces before the 3000.14
You can format string values in a similar way, controlling the width and also the text justification (left, right, and centered). The characters \textbf{$>$}, \textbf{$<$}, and \textbf{\^} are used to control right, left, and center justification, respectively. Justification works with numeric values as well. As with numbers, the width given in the replacement field is the minimum number of characters Python will use. If the string is longer, then the width will grow to accomodate.
\newpage
\beforeverb
\begin{verbatim}
>>> print('{:2}'.format('Hi!'))
Hi!
>>> print('{:3}'.format('Hi!'))
Hi!
>>> print('{:20}'.format('Hi!'))
Hi!
>>> print('{:<20}'.format('Hi!'))
Hi!
>>> print('{:>20}'.format('Hi!'))
Hi!
>>> print('{:^20}'.format('Hi!'))
Hi!
\end{verbatim}
\afterverb
%
{\bf Based on the last 4 print statements, what is the default justification for text values?} Now review the numeric formatting we discussed earlier and determine what the default justification is for numbers.
If you include the precision when formatting a string, then the string {\bf will be truncated}, if it is longer than the precision value. For example:
\beforeverb
\begin{verbatim}
>>> print('{:.3}'.format('Hi!'))
Hi!
>>> print('{:.2}'.format('Hi!'))
Hi
>>> print('{:20.2}'.format('Hi!'))
Hi
>>> print('{:>20.2}'.format('Hi!'))
Hi
>>> print('{:^20.1}'.format('Hi!'))
H
\end{verbatim}
\afterverb
%
Again, review the replacement field and the output to be sure you can explain what is happening.
A couple of common errors you may encounter with format strings are having more replacement fields than values being passed to the {\tt format} method, and defining the replacement field for one type of information but passing a different type of value to the {\tt format} method. Here are examples of those situations:
\index{exception!IndexError}
\index{IndexError}
\index{exception!ValueError}
\index{ValueError}
\beforeverb
\begin{verbatim}
>>> print('{} {} {}'.format('Hello', 'World!'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: Replacement index 2 out of range for positional args tuple
>>> print('{:.2f}'.format('Hi'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: Unknown format code 'f' for object of type 'str'
\end{verbatim}
\afterverb
%
In the first example, there aren't enough values being passed, so Python displays an \textbf{IndexError}. In the
second example, the value being passed is the wrong type; the replacement field was defined for a floating point value and the {\tt format} method is being passed a string, so Python displays a \textbf{ValueError}.
Format strings and replacement field options are powerful. They allow you to make the output of your Python programs look organized and professional. However, they can be overwhelming as you start using them. Be patient, they'll become second nature as you write more sophisticated programs. You
can read more about format strings at
\url{https://docs.python.org/3/library/string.html#formatstrings}
\section{Debugging}
\index{debugging}
A skill that you should cultivate as you program is always
asking yourself, ``What could go wrong here?'' or alternatively,
``What crazy thing might our user do to crash our (seemingly)
perfect program?''
For example, look at the program which we used to demonstrate
the {\tt while} loop in the chapter on iteration:
\beforeverb
\begin{verbatim}
while True:
line = input('> ')
if line[0] == '#' :
continue
if line == 'done':
break
print(line)
print 'Done!'
\end{verbatim}
\afterverb
%
Look what happens when the user enters an empty line of input:
\beforeverb
\begin{verbatim}
> hello there
hello there
> # don't print this
> print this!
print this!
>
Traceback (most recent call last):
File "copytildone.py", line 3, in <module>
if line[0] == '#' :
\end{verbatim}
\afterverb
%
The code works fine until it is presented an empty line. Then
there is no zero-th character, so we get a traceback. There are two
solutions to this to make line three ``safe'' even if the line is
empty.
One possibility is to simply use the {\tt startswith} method
which returns {\tt False} if the string is empty.
\beforeverb
\begin{verbatim}
if line.startswith('#') :
\end{verbatim}
\afterverb
%
\index{guardian pattern}
\index{pattern!guardian}
Another way is to safely write the {\tt if} statement using the {\bf guardian}
pattern and make sure the second logical expression is evaluated
only where there is at least one character in the string.:
\beforeverb
\begin{verbatim}
if len(line) > 0 and line[0] == '#' :
\end{verbatim}
\afterverb
%
\section{Glossary}
\begin{description}
\item[counter:] A variable used to count something, usually initialized
to zero and then incremented.
\index{counter}
\item[empty string:] A string with no characters and length 0, represented
by two quotation marks.
\index{empty string}
\item[format operator:] An operator, {\tt \%}, that takes a format
string and a tuple and generates a string that includes
the elements of the tuple formatted as specified by the format string.
\index{format operator}
\index{operator!format}
\item[format sequence:] A sequence of characters in a format string,
like {\tt \%d}, that specifies how a value should be formatted.
\index{format sequence}
\item[format string:] A string, used with the format operator, that
contains format sequences.
\index{format string}
\item[flag:] A boolean variable used to indicate whether a condition
is true.
\index{flag}
\item[invocation:] A statement that calls a method.
\index{invocation}
\item[immutable:] The property of a sequence whose items cannot
be assigned.
\index{immutability}
\item[index:] An integer value used to select an item in
a sequence, such as a character in a string.
\index{index}
\item[item:] One of the values in a sequence.
\index{item}
\item[method:] A function that is associated with an object and called
using dot notation.
\index{method}
\item[object:] Something a variable can refer to. For now,
you can use ``object'' and ``value'' interchangeably.
\index{object}
\item[search:] A pattern of traversal that stops
when it finds what it is looking for.
\index{search pattern}
\index{pattern!search}
\item[sequence:] An ordered set; that is, a set of
values where each value is identified by an integer index.
\index{sequence}
\item[slice:] A part of a string specified by a range of indices.
\index{slice}
\item[traverse:] To iterate through the items in a sequence,
performing a similar operation on each.
\index{traversal}
\end{description}
\section{Exercises}
\begin{ex}
Given that {\tt fruit} is a string, what does
{\tt fruit[:]} mean?
\end{ex}
\begin{ex}
Write a {\tt while} loop that starts at the last character in the string
and works its way backwards to the first character in the string,
printing each letter on a separate line, except backwards. A similar program can be found on page \pageref{ch6StringTraversal}.
\end{ex}
\begin{ex}
\index{encapsulation}
Encapsulate the code from the letter counting example program (page \pageref{ch6LetterCount}) in a function named {\tt count}, and generalize it so that it accepts the string and the
letter as arguments.
You should then call the function a couple of times passing a different string and letter to count. Make sure the results are correct.
\end{ex}
\begin{ex}
\index{count method}
\index{method!count}
There is a string method called {\tt count} that is similar
to the function shown in the example on page \pageref{ch6StartsWithExample}. Read the documentation
of this method at
\url{https://docs.python.org/3/library/stdtypes.html#string-methods}
and write an invocation that counts the number of times the
letter {\tt a} occurs
in \verb"'banana'". Be sure your code works regardless of the case of the letters. That means the count should work for {\tt A} and \verb"'banANA'" as well as {\tt a} and \verb"'BANAna'". In all cases it should find 3.
\end{ex}
\begin{ex}
Take the following Python code that stores a string:`
\beforeverb
\begin{alltt}
str = 'X-DSPAM-Confidence: {\bf 0.8475}'
\end{alltt}
\afterverb
Use {\tt find} and string slicing to extract the portion
of the string after the colon character and then use the
{\tt float} function to convert the extracted string
into a floating point number.
\end{ex}
\begin{ex}
\index{string method}
\index{method!string}
Read the documentation of the string methods at
\url{https://docs.python.org/3/library/stdtypes.html#string-methods}.
You might want to experiment with some of them to make sure
you understand how they work. {\tt strip} and
{\tt replace} are particularly useful.
The documentation uses a syntax that might be confusing.
For example, in \verb"find(sub[, start[, end]])", the brackets
indicate optional arguments. So {\tt sub} is required, but
{\tt start} is optional, and if you include {\tt start},
then {\tt end} is optional.
\end{ex}