Working with large integers in libpython-clj
Context
libpython-clj
is an enormously useful library for bridging the Python and Clojure (and Java) worlds. Its data-oriented interface to Python classes and objects makes discovering features of Python libraries simple and convenient from a Clojure REPL. This article documents a problem I encountered when using it recently and the solution I put together for it.
As Chris Nuernberger, the library's author described it to me, this is "one of the darker corners of libpython-clj," so I hope this offers some guidance to others who encounter similar issues.
Error
- Error type
class java.io.FileNotFoundException
- Error message
Could not locate libpython_clj2/python__init.class, libpython_clj2/python.clj or libpython_clj2/python.cljc on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.
- Error phase
- Location
- Lines 15-18, Columns 1-53
Source expression
(require
'[libpython-clj2.python :as py :refer [py. py.. py.-]]
'[libpython-clj2.require :refer [require-python]]
'[libpython-clj2.python.protocols :as py-proto])
Error
- Error type
class clojure.lang.Compiler$CompilerException
- Error message
Unable to resolve symbol: require-python in this context
- Error phase
:compile-syntax-check
- Location
- Line 19, Columns 3-46
Source expression
(require-python '[builtins :as python])
The problem
I was unable to successfully convert very large integer values - those larger than can be represented by a signed 64-bit integer - between Python and Clojure. Here's an example:
Python -> Clojure: Overflow
Error
- Error type
class clojure.lang.Compiler$CompilerException
- Error message
No such namespace: py
- Error phase
:compile-syntax-check
- Location
- Line 25, Columns 1-76
Source expression
(-> 29389942368720948710978341 str py/as-python python/int py/as-jvm)
Clojure -> Python: Error
Error
- Error type
class clojure.lang.Compiler$CompilerException
- Error message
libpython-clj2.python.copy
- Error phase
:compile-syntax-check
- Location
- Lines 28-30, Columns 1-105
Source expression
(try (libpython-clj2.python.copy/->py-long 29389942368720948710978341)
(catch java.lang.IllegalArgumentException e
[:pre [:code {:class "language-clojure"} (select-keys (Throwable->map e) [:cause :via])]]))
Background
In version 3 onwards of Python, integer values are arbitrarily sized. On the JVM, integer values larger than 64 bits can be represented by clojure.lang.BigInt
or java.math.BigInteger
objects. However, despite being supported on both platforms, the copy pathway used by libpython-clj
defaults to converting Python ints to 64-bit integers, resulting in the overflow seen above.
When converting numeric JVM types to Python types, integers get cast to Long
(e.g. 64-bit - Clojure's default for integers) values, triggering the exception seen above.
Creating a workaround
The solution suggested to me by Chris Nuernberger, the author of libpython-clj
, was to create a custom Python class for the integer values I needed to work with - effectively my own boxed integer. This would allow me to bypass libpython-clj
's default conversion pathways.
Python class definition
class BoxedInt(int):
def __init__(self, num):
self.num = num
def __call__(self, arg):
'''Initialize the value. Optionally converts from string.'''
if(type(arg) == str):
self.num = int(arg)
else:
self.num = arg
def __int__(self):
return(self.num)
def __str__(self):
return(str(self.num))
def __repr__(self):
return('BoxedInt(' + str(self.num) + ')' )
Because strings convert to and from Python in an identical manner, they can be used as an escape hatch for conversion for values larger than the natively-supported :int64
datatype. To make sure that the class behaves in a manner consistent with other integers in Python (e.g. it can be used for selection, slicing, etc), the BoxedInt
class inherits from the built-in int
type.
Executing this Python code string creates an environment that a reference to the Python class object can be extracted from:
Error
- Error type
class clojure.lang.Compiler$CompilerException
- Error message
No such namespace: py
- Error phase
:compile-syntax-check
- Location
- Lines 78-81, Columns 1-74
Source expression
(def boxed-py-int
(get-in
(py/run-simple-string boxed-int-class-str ) [:globals :BoxedInt]))
With this class defined, libpython-clj provides the rest of the elements necesssary for a solution through the multimethods and protocols defined in the namespace.
This multimethod dispatches on the Python type of the object, allowing for the construction of a BigInt
from the BoxedInt
's string representation.
Error
- Error type
class clojure.lang.Compiler$CompilerException
- Error message
No such namespace: py-proto
- Error phase
:compile-syntax-check
- Location
- Lines 87-88, Columns 1-61
Source expression
(defmethod py-proto/pyobject->jvm :boxed-int
[pyobj & args] (bigint (py/as-jvm (python/str pyobj))))
Going the other way, the PCopyToPython
protocol can be extended to new types, including the two JVM types used for larger-than-64-bit integers.
Error
- Error type
class clojure.lang.Compiler$CompilerException
- Error message
No such namespace: py-proto
- Error phase
:compile-syntax-check
- Location
- Lines 93-97, Columns 1-59
Source expression
(extend-protocol py-proto/PCopyToPython
clojure.lang.BigInt
(py/->python [item opts] (boxed-py-int (str item)))
java.math.BigDecimal
(py/->python [item opts] (boxed-py-int (str item))))
Another integer value can be used to verify the roundtrip behavior:
Error
- Error type
class clojure.lang.Compiler$CompilerException
- Error message
No such namespace: py
- Error phase
:compile-syntax-check
- Location
- Lines 101-107, Columns 1-77
Source expression
(let [large-val 29289384293856920729839229839285108
after-conv (-> large-val
py/->python
py/->jvm)]
(assert (= large-val after-conv)
"Values should successfully roundtrip to/from boxed int type"))
Concluding remarks
When still working through this problem, I tried to define the BoxedInt class using libpython-clj2.python/create-class
, but I couldn't define the constructor and __call__
methods using Clojure functions without again encountering the Python -> JVM conversion that prompted the overflow in the first place. Perhaps there's a way to achieve the same result using create-class
that I couldn't figure out, but I knew I'd have complete control over the Python side of things by just using the class definition.
I may not have fully implemented this solution; I didn't implement the protocols PBridgeToPython
or PCopyToJVM
, which may have fleshed out how to store large integers in a pointer format instead of copying them between the platforms. My existing solution isn't for performance or memory-intensive code, so I didn't feel it was necessary to figure out those aspects to move forward with my specific problem. But a more general solution to the problem of how to represent integers greater than 64 bits in libpython-clj
might involve more consideration of how to implement those types and protocols.
I was also struck by the design and implementation of libpython-clj
, and by how comprehensible its codebase is. I was able to rework its behavior to operate on new data types by just reading through the implementations of multimethods and protocols it uses for its core data types. Though I struggled to solve this problem, it was never because it was difficult to figure out what libpython-clj
was doing.
Thanks again to Chris Neurnberger for pointing me in the right direction on the Clojurians slack when I was still figuring out what to do.