Working with large integers in libpython-clj

Context


libpython-clj is an enormously useful library for bridging the Python and Clojure (and Java) worlds. Its data-oriented interface to Python classes and objects makes discovering features of Python libraries simple and convenient from a Clojure REPL. This article documents a problem I encountered when using it recently and the solution I put together for it.

As Chris Nuernberger, the library's author described it to me, this is "one of the darker corners of libpython-clj," so I hope this offers some guidance to others who encounter similar issues.

(require
  '[libpython-clj2.python :as py :refer [py. py.. py.-]]
  '[libpython-clj2.require :refer [require-python]]
  '[libpython-clj2.python.protocols :as py-proto])

(require-python '[builtins :as python])


The problem

I was unable to successfully convert very large integer values - those larger than can be represented by a signed 64-bit integer - between Python and Clojure. Here's an example:

Python -> Clojure: Overflow

(-> 29389942368720948710978341N str py/as-python python/int py/as-jvm)

-585323467629284571


Clojure -> Python: Error

(try
 (libpython-clj2.python.copy/->py-long 29389942368720948710978341N)
 (catch
  java.lang.IllegalArgumentException
  e
  [:pre
   [:code
    {:class "language-clojure"}
    (select-keys (Throwable->map e) [:cause :via])]]))
{:cause "Value out of range for long: 29389942368720948710978341", :via [{:type java.lang.IllegalArgumentException, :message "Value out of range for long: 29389942368720948710978341", :at [clojure.lang.RT longCast "RT.java" 1265]}]}


Background

In version 3 onwards of Python, integer values are arbitrarily sized. On the JVM, integer values larger than 64 bits can be represented by clojure.lang.BigInt or java.math.BigInteger objects. However, despite being supported on both platforms, the copy pathway used by libpython-clj defaults to converting Python ints to 64-bit integers, resulting in the overflow seen above.

When converting numeric JVM types to Python types, integers get cast to Long (e.g. 64-bit - Clojure's default for integers) values, triggering the exception seen above.

Creating a workaround

The solution suggested to me by Chris Nuernberger, the author of libpython-clj, was to create a custom Python class for the integer values I needed to work with - effectively my own boxed integer. This would allow me to bypass libpython-clj's default conversion pathways.

Python class definition

class BoxedInt(int):

    def __init__(self, num):
        self.num = num

    def __call__(self, arg):
        '''Initialize the value. Optionally converts from string.'''
        if(type(arg) == str):
            self.num = int(arg)
        else:
            self.num = arg

    def __int__(self):
        return(self.num)

    def __str__(self):
        return(str(self.num))

    def __repr__(self):
        return('BoxedInt(' + str(self.num) + ')' )

Because strings convert to and from Python in an identical manner, they can be used as an escape hatch for conversion for values larger than the natively-supported :int64 datatype. To make sure that the class behaves in a manner consistent with other integers in Python (e.g. it can be used for selection, slicing, etc), the BoxedInt class inherits from the built-in int type.

Executing this Python code string creates an environment that a reference to the Python class object can be extracted from:


(def boxed-py-int
  (get-in
   (py/run-simple-string  boxed-int-class-str ) [:globals :BoxedInt])) 

With this class defined, libpython-clj provides the rest of the elements necesssary for a solution through the multimethods and protocols defined in the namespace.

This multimethod dispatches on the Python type of the object, allowing for the construction of a BigInt from the BoxedInt's string representation.

(defmethod py-proto/pyobject->jvm :boxed-int
   [pyobj & args] (bigint (py/as-jvm (python/str pyobj))))

Going the other way, the PCopyToPython protocol can be extended to new types, including the two JVM types used for larger-than-64-bit integers.

 (extend-protocol py-proto/PCopyToPython
    clojure.lang.BigInt
    (py/->python [item opts] (boxed-py-int (str item)))
    java.math.BigDecimal
    (py/->python [item opts] (boxed-py-int (str item))))

Another integer value can be used to verify the roundtrip behavior:

(let [large-val 29289384293856920729839229839285108
       after-conv (-> large-val
                      py/->python
                      py/->jvm)]

   (assert (= large-val after-conv)
           "Values should successfully roundtrip to/from boxed int type"))


Concluding remarks

When still working through this problem, I tried to define the BoxedInt class using libpython-clj2.python/create-class, but I couldn't define the constructor and __call__ methods using Clojure functions without again encountering the Python -> JVM conversion that prompted the overflow in the first place. Perhaps there's a way to achieve the same result using create-class that I couldn't figure out, but I knew I'd have complete control over the Python side of things by just using the class definition.

I may not have fully implemented this solution; I didn't implement the protocols PBridgeToPython or PCopyToJVM, which may have fleshed out how to store large integers in a pointer format instead of copying them between the platforms. My existing solution isn't for performance or memory-intensive code, so I didn't feel it was necessary to figure out those aspects to move forward with my specific problem. But a more general solution to the problem of how to represent integers greater than 64 bits in libpython-clj might involve more consideration of how to implement those types and protocols.

I was also struck by the design and implementation of libpython-clj, and by how comprehensible its codebase is. I was able to rework its behavior to operate on new data types by just reading through the implementations of multimethods and protocols it uses for its core data types. Though I struggled to solve this problem, it was never because it was difficult to figure out what libpython-clj was doing.

Thanks again to Chris Neurnberger for pointing me in the right direction on the Clojurians slack when I was still figuring out what to do.