If a string is changed to become bigger, consider trying to find the maximum size of the final string early on in a quick pass (strive for \(O(n)\))
- Consider working from the tail of the string and iterating backwards if, say, 1 character needs to be replaced with 2.
Traditional deletion of 1 character in a string is a \(O(n)\) process because elements to the right of the deleted index has to be shifted to the left, and this has a worst case scenario of iterating through ~\(n\) elements.
- If deleting entries in a string, consider swapping indices to make it seem like entries are being deleted.
- Do note that you cannot do index assignments with strings in Python. This is because strings are immutable.
Consider the consequences of
+=
and=
operators- This is naturally ~\(O(n)\) in time complexity because you are allocating a new string and copying over \n\ elements, where \n\ represents the amount of characters in the new string.
- This entails that using
+=
with strings in a single for-loop would result in \(O(n^2)\) performance, which isn't good! - But... the standard Python interpreter (CPython) does use a bit of magic (using
bytearray
) to try to reduce this down to \(O(n)\). However this performance reduction depends on some edge cases, and in general, we shouldn't rely on implementation details. See here.
String slicing does copy
- A statement like
new_s = s[3:7]
does cost \(O(n)\).
- A statement like
bytearray: a way to work with mutable strings
- We can use
bytearray
to manipulate strings in a fashion similar to lists. - String index assignment can be done with
bytearray
. i.e.s[0] = ord('A')
ord
returns an integer representing the Unicode code point of that character. bytearray
can easily be decoded to a Python 3 string and encoded back to a bytearray. For example:b = bytearray("hell0", "utf-8")
encodes the unicode string into a bytearray.b.decode()
will output 'hell0' with type 'str'.- Usage of
bytearray
is uncommon because majority of the time when we work with strings, we either compare or format them instead of modifying them.
- We can use