@rkenmi - Python String Tricks - Performance considerations

Python String Tricks - Performance considerations

Python String Tricks - Performance considerations

Back to Top

Updated on September 21, 2017
  • If a string is changed to become bigger, consider trying to find the maximum size of the final string early on in a quick pass (strive for \(O(n)\))

    • Consider working from the tail of the string and iterating backwards if, say, 1 character needs to be replaced with 2.
  • Traditional deletion of 1 character in a string is a \(O(n)\) process because elements to the right of the deleted index has to be shifted to the left, and this has a worst case scenario of iterating through ~\(n\) elements.

    • If deleting entries in a string, consider swapping indices to make it seem like entries are being deleted.
    • Do note that you cannot do index assignments with strings in Python. This is because strings are immutable.
  • Consider the consequences of += and = operators

    • This is naturally ~\(O(n)\) in time complexity because you are allocating a new string and copying over \n\ elements, where \n\ represents the amount of characters in the new string.
    • This entails that using += with strings in a single for-loop would result in \(O(n^2)\) performance, which isn't good!
    • But... the standard Python interpreter (CPython) does use a bit of magic (using bytearray) to try to reduce this down to \(O(n)\). However this performance reduction depends on some edge cases, and in general, we shouldn't rely on implementation details. See here.
  • String slicing does copy

    • A statement like new_s = s[3:7] does cost \(O(n)\).
  • bytearray: a way to work with mutable strings

    • We can use bytearray to manipulate strings in a fashion similar to lists.
    • String index assignment can be done with bytearray. i.e. s[0] = ord('A') ord returns an integer representing the Unicode code point of that character.
    • bytearray can easily be decoded to a Python 3 string and encoded back to a bytearray. For example: b = bytearray("hell0", "utf-8") encodes the unicode string into a bytearray. b.decode() will output 'hell0' with type 'str'.
    • Usage of bytearray is uncommon because majority of the time when we work with strings, we either compare or format them instead of modifying them.

Article Tags: