Working with large integers in libpython-clj

Context


libpython-clj is an enormously useful library for bridging the Python and Clojure (and Java) worlds. Its data-oriented interface to Python classes and objects makes discovering features of Python libraries simple and convenient from a Clojure REPL. This article documents a problem I encountered when using it recently and the solution I put together for it.

As Chris Nuernberger, the library's author described it to me, this is "one of the darker corners of libpython-clj," so I hope this offers some guidance to others who encounter similar issues.

Error
Error type
class java.io.FileNotFoundException
Error message
Could not locate libpython_clj2/python__init.class, libpython_clj2/python.clj or libpython_clj2/python.cljc on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.
Error phase
Location
Lines 15-18, Columns 1-53
Source expression
(require
  '[libpython-clj2.python :as py :refer [py. py.. py.-]]
  '[libpython-clj2.require :refer [require-python]]
  '[libpython-clj2.python.protocols :as py-proto])

Error
Error type
class clojure.lang.Compiler$CompilerException
Error message
Unable to resolve symbol: require-python in this context
Error phase
:compile-syntax-check
Location
Line 19, Columns 3-46
Source expression
(require-python '[builtins :as python])


The problem

I was unable to successfully convert very large integer values - those larger than can be represented by a signed 64-bit integer - between Python and Clojure. Here's an example:

Python -> Clojure: Overflow

Error
Error type
class clojure.lang.Compiler$CompilerException
Error message
No such namespace: py
Error phase
:compile-syntax-check
Location
Line 25, Columns 1-76
Source expression
 (-> 29389942368720948710978341 str py/as-python python/int py/as-jvm)


Clojure -> Python: Error

Error
Error type
class clojure.lang.Compiler$CompilerException
Error message
libpython-clj2.python.copy
Error phase
:compile-syntax-check
Location
Lines 28-30, Columns 1-105
Source expression
 (try (libpython-clj2.python.copy/->py-long 29389942368720948710978341)
        (catch java.lang.IllegalArgumentException e
          [:pre [:code {:class "language-clojure"} (select-keys (Throwable->map e) [:cause :via])]])) 


Background

In version 3 onwards of Python, integer values are arbitrarily sized. On the JVM, integer values larger than 64 bits can be represented by clojure.lang.BigInt or java.math.BigInteger objects. However, despite being supported on both platforms, the copy pathway used by libpython-clj defaults to converting Python ints to 64-bit integers, resulting in the overflow seen above.

When converting numeric JVM types to Python types, integers get cast to Long (e.g. 64-bit - Clojure's default for integers) values, triggering the exception seen above.

Creating a workaround

The solution suggested to me by Chris Nuernberger, the author of libpython-clj, was to create a custom Python class for the integer values I needed to work with - effectively my own boxed integer. This would allow me to bypass libpython-clj's default conversion pathways.

Python class definition

class BoxedInt(int):

    def __init__(self, num):
        self.num = num

    def __call__(self, arg):
        '''Initialize the value. Optionally converts from string.'''
        if(type(arg) == str):
            self.num = int(arg)
        else:
            self.num = arg

    def __int__(self):
        return(self.num)

    def __str__(self):
        return(str(self.num))

    def __repr__(self):
        return('BoxedInt(' + str(self.num) + ')' )

Because strings convert to and from Python in an identical manner, they can be used as an escape hatch for conversion for values larger than the natively-supported :int64 datatype. To make sure that the class behaves in a manner consistent with other integers in Python (e.g. it can be used for selection, slicing, etc), the BoxedInt class inherits from the built-in int type.

Executing this Python code string creates an environment that a reference to the Python class object can be extracted from:

Error
Error type
class clojure.lang.Compiler$CompilerException
Error message
No such namespace: py
Error phase
:compile-syntax-check
Location
Lines 78-81, Columns 1-74
Source expression

(def boxed-py-int
  (get-in
   (py/run-simple-string  boxed-int-class-str ) [:globals :BoxedInt])) 

With this class defined, libpython-clj provides the rest of the elements necesssary for a solution through the multimethods and protocols defined in the namespace.

This multimethod dispatches on the Python type of the object, allowing for the construction of a BigInt from the BoxedInt's string representation.

Error
Error type
class clojure.lang.Compiler$CompilerException
Error message
No such namespace: py-proto
Error phase
:compile-syntax-check
Location
Lines 87-88, Columns 1-61
Source expression
(defmethod py-proto/pyobject->jvm :boxed-int
   [pyobj & args] (bigint (py/as-jvm (python/str pyobj))))

Going the other way, the PCopyToPython protocol can be extended to new types, including the two JVM types used for larger-than-64-bit integers.

Error
Error type
class clojure.lang.Compiler$CompilerException
Error message
No such namespace: py-proto
Error phase
:compile-syntax-check
Location
Lines 93-97, Columns 1-59
Source expression
 (extend-protocol py-proto/PCopyToPython
    clojure.lang.BigInt
    (py/->python [item opts] (boxed-py-int (str item)))
    java.math.BigDecimal
    (py/->python [item opts] (boxed-py-int (str item))))

Another integer value can be used to verify the roundtrip behavior:

Error
Error type
class clojure.lang.Compiler$CompilerException
Error message
No such namespace: py
Error phase
:compile-syntax-check
Location
Lines 101-107, Columns 1-77
Source expression
(let [large-val 29289384293856920729839229839285108
       after-conv (-> large-val
                      py/->python
                      py/->jvm)]

   (assert (= large-val after-conv)
           "Values should successfully roundtrip to/from boxed int type"))


Concluding remarks

When still working through this problem, I tried to define the BoxedInt class using libpython-clj2.python/create-class, but I couldn't define the constructor and __call__ methods using Clojure functions without again encountering the Python -> JVM conversion that prompted the overflow in the first place. Perhaps there's a way to achieve the same result using create-class that I couldn't figure out, but I knew I'd have complete control over the Python side of things by just using the class definition.

I may not have fully implemented this solution; I didn't implement the protocols PBridgeToPython or PCopyToJVM, which may have fleshed out how to store large integers in a pointer format instead of copying them between the platforms. My existing solution isn't for performance or memory-intensive code, so I didn't feel it was necessary to figure out those aspects to move forward with my specific problem. But a more general solution to the problem of how to represent integers greater than 64 bits in libpython-clj might involve more consideration of how to implement those types and protocols.

I was also struck by the design and implementation of libpython-clj, and by how comprehensible its codebase is. I was able to rework its behavior to operate on new data types by just reading through the implementations of multimethods and protocols it uses for its core data types. Though I struggled to solve this problem, it was never because it was difficult to figure out what libpython-clj was doing.

Thanks again to Chris Neurnberger for pointing me in the right direction on the Clojurians slack when I was still figuring out what to do.