<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>

<channel>
	<title>The GITS Blog &#187; python</title>
	<atom:link href="http://www.ginstrom.com/scribbles/category/python/feed/" rel="self" type="application/rss+xml" />
	<link>http://ginstrom.com/scribbles</link>
	<description>Random scribbling about programming, translation, and Japan</description>
	<pubDate>Mon, 25 Aug 2008 05:54:18 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
	<language>en</language>
			<item>
		<title>WxPython 2.8.8.0 quietly introduces true ActiveX hosting for Windows</title>
		<link>http://ginstrom.com/scribbles/2008/08/20/wxpython-2880-quietly-introduces-true-activex-hosting-for-windows/</link>
		<comments>http://ginstrom.com/scribbles/2008/08/20/wxpython-2880-quietly-introduces-true-activex-hosting-for-windows/#comments</comments>
		<pubDate>Wed, 20 Aug 2008 10:46:49 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/08/20/wxpython-2880-quietly-introduces-true-activex-hosting-for-windows/</guid>
		<description><![CDATA[Version 2.8.8.0 of WxPython uses the new activex class to host ActiveX controls on Windows. This means that unlike previous implementations of the wrapper for the IE HTML ActiveX control, this version has full access to the browser events and DOM.
This is a huge advance for Windows GUI programming with WxPython. Now WxPython applications on [...]]]></description>
			<content:encoded><![CDATA[<p>Version 2.8.8.0 of <a href="http://www.wxpython.org/">WxPython</a> uses the new activex class to host ActiveX controls on Windows. This means that unlike previous implementations of the wrapper for the IE HTML ActiveX control, this version has full access to the browser events and DOM.</p>
<p>This is a huge advance for Windows GUI programming with WxPython. Now WxPython applications on Windows can host an IE window, and control the contents programmatically via the document property.</p>
<p>The WxPython team has been pretty quiet about it. The changes don't seem to have made it into the docs (although they're mentioned in the <a href="http://www.wxpython.org/recentchanges.php">change log</a> and the sample code has been updated).</p>
<p>Here's a simple example of a dialog box with an HTML window. The dialog intercepts clicks on the links, and uses them to set the color of the text by accessing its css property.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="co1">#coding: UTF8</span><br />
<span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
ColorWindow</p>
<p>Demonstrates controlling DOM of IEHtmlWindow<br />
&quot;</span><span class="st0">&quot;&quot;</span></p>
<p><span class="kw1">import</span> wx<br />
<span class="kw1">from</span> wx.<span class="me1">lib</span> <span class="kw1">import</span> iewin<br />
<span class="kw1">from</span> wx.<span class="me1">lib</span> <span class="kw1">import</span> sized_controls as sc</p>
<p>HTML_DOCUMENT = u<span class="st0">&quot;&quot;</span><span class="st0">&quot;&lt;html&gt;<br />
&nbsp; &nbsp; &lt;body&gt;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &lt;p id=&quot;</span>text<span class="st0">&quot;&gt;Change the color of the text&lt;/p&gt;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &lt;p&gt;Actions:&lt;/p&gt;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &lt;ul&gt;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;li&gt;&lt;a href=&quot;</span>/red<span class="st0">&quot;&gt;Make it red&lt;/a&gt;&lt;/li&gt;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;li&gt;&lt;a href=&quot;</span>/blue<span class="st0">&quot;&gt;Make it blue&lt;/a&gt;&lt;/li&gt;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;li&gt;&lt;a href=&quot;</span>/green<span class="st0">&quot;&gt;Make it green&lt;/a&gt;&lt;/li&gt;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &lt;/ul&gt;<br />
&nbsp; &nbsp; &lt;/body&gt;<br />
&lt;/html&gt;&quot;</span><span class="st0">&quot;&quot;</span></p>
<p><span class="kw1">class</span> ColorWindow<span class="br0">&#40;</span>sc.<span class="me1">SizedDialog</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Hosts an IEHtmlWindow, and responds to clicks on links<br />
&nbsp; &nbsp; by setting the color of the text.<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, parent<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">ie</span> = <span class="kw2">None</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; flag = wx.<span class="me1">DEFAULT_DIALOG_STYLE</span>|wx.<span class="me1">RESIZE_BORDER</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; sc.<span class="me1">SizedDialog</span>.<span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;parent,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="nu0">-1</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="st0">&quot;Color Window&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;style=flag,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;size=<span class="br0">&#40;</span><span class="nu0">300</span>,<span class="nu0">300</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">layout</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> layout<span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Performs the layout of GUI widgets&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; pane = <span class="kw2">self</span>.<span class="me1">GetContentsPane</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">ie</span> = iewin.<span class="me1">IEHtmlWindow</span><span class="br0">&#40;</span>pane<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">ie</span>.<span class="me1">SetSizerProps</span><span class="br0">&#40;</span>expand=<span class="kw2">True</span>, proportion=<span class="nu0">1</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">ie</span>.<span class="me1">LoadString</span><span class="br0">&#40;</span>HTML_DOCUMENT<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">ie</span>.<span class="me1">AddEventSink</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">SetButtonSizer</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">CreateStdDialogButtonSizer</span><span class="br0">&#40;</span>wx.<span class="me1">OK</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; pane.<span class="me1">Fit</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> BeforeNavigate2<span class="br0">&#40;</span><span class="kw2">self</span>, this, pDisp, URL, Flags,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; TargetFrameName, PostData, Headers,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Cancel<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; This is a callback from the HTML window before<br />
&nbsp; &nbsp; &nbsp; &nbsp; navigating to a clicked link. We'll use it to set the<br />
&nbsp; &nbsp; &nbsp; &nbsp; color, then cancel.<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; color = URL<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="st0">&quot;/&quot;</span><span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">-1</span><span class="br0">&#93;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; elem = <span class="kw2">self</span>.<span class="me1">ie</span>.<span class="me1">document</span>.<span class="me1">getElementById</span><span class="br0">&#40;</span><span class="st0">&quot;text&quot;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; elem.<span class="me1">style</span>.<span class="me1">cssText</span> = <span class="st0">&quot;color: %s;&quot;</span> % color<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># cancel it so it doesn't actually try to navigate there</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; Cancel<span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span> = <span class="kw2">True</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">'__main__'</span>:<br />
&nbsp; &nbsp; application = wx.<span class="me1">PySimpleApp</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; window = ColorWindow<span class="br0">&#40;</span><span class="kw2">None</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; window.<span class="me1">ShowModal</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; window.<span class="me1">Destroy</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; application.<span class="me1">MainLoop</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<p><a href="/code/htmlwin.zip">Here's the code (htmlwin.zip)</a>.</p>
<p>The method BeforeNavigate2 intercepts clicks on links, and uses the link information to get the desired color. Then it finds the element in the DOM with an id of "text", and sets that element's color to the link color.</p>
<p>The possibilities of this technique are huge. Although it has the disadvantage of tying you to Windows, if your application is going to be Windows-only anyway, it lets you write an application with many of the benefits of a Web app, and very few of the drawbacks.</p>
<p>Here's a screenshot of the above code in action:<br />
<img src="/img/htmlwin_screenshot.png" alt="Screenshot of the HTML dialog" /></p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/08/20/wxpython-2880-quietly-introduces-true-activex-hosting-for-windows/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Using custom functions with SQLAlchemy and SQLite</title>
		<link>http://ginstrom.com/scribbles/2008/08/10/using-custom-functions-with-sqlalchemy-and-sqlite/</link>
		<comments>http://ginstrom.com/scribbles/2008/08/10/using-custom-functions-with-sqlalchemy-and-sqlite/#comments</comments>
		<pubDate>Sat, 09 Aug 2008 14:43:10 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/08/10/using-custom-functions-with-sqlalchemy-and-sqlite/</guid>
		<description><![CDATA[I recently developed a Web-based translation memory (TM) application in Python. One thing the application does is fuzzy glossary matching: given a source sentence, it'll find all terms in the glossary that are fuzzy substrings of that sentence (using my fuzzy substring matching module, which is based on the Levenshtein distance algorithm), and return the [...]]]></description>
			<content:encoded><![CDATA[<p>I recently developed a <a href="http://felix-cat.com/tools/memory-serves/">Web-based translation memory (TM) application</a> in Python. One thing the application does is fuzzy glossary matching: given a source sentence, it'll find all terms in the glossary that are fuzzy substrings of that sentence (using my <a href="http://pypi.python.org/pypi/subdist/0.2.1">fuzzy substring matching module</a>, which is based on the <a href="http://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein distance</a> algorithm), and return the terms along with their translations.</p>
<p>Here's how I created a custom function for fuzzy glossary searches, using <a href="http://www.sqlalchemy.org/">SQLAlchemy</a> for the ORM, with <a href="http://www.sqlite.org/">SQLite</a> as the database engine. Assuming you've got your <a href="http://www.sqlalchemy.org/docs/04/session.html">SessionClass object</a>, create a session, and get a connection object:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">import</span> subdist</p>
<p><span class="kw1">def</span> make_gloss_func<span class="br0">&#40;</span>haystack<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Creates a fuzzy substring matcher using haystack<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; get_score = subdist.<span class="me1">get_score</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> gloss_func<span class="br0">&#40;</span>needle<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> get_score<span class="br0">&#40;</span>needle, haystack<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> gloss_func</p>
<p><span class="kw1">class</span> TM<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Represents a translation memory (TM)/glossary&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># stuff skipped&#8230;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> gloss_search<span class="br0">&#40;</span><span class="kw2">self</span>, query, minscore<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Do a fuzzy glossary search.<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">try</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; session = <span class="kw2">self</span>.<span class="me1">SessionClass</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># Create the custom function</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; gloss_func = make_gloss_func<span class="br0">&#40;</span>query<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; conn = session.<span class="me1">bind</span>.<span class="me1">connect</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; conn.<span class="me1">connection</span>.<span class="me1">create_function</span><span class="br0">&#40;</span><span class="st0">&quot;gloss_score&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="nu0">1</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; gloss_func<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># Execute the query</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; search_string = <span class="st0">&quot;&quot;</span><span class="st0">&quot;SELECT * FROM records<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; WHERE gloss_score(source)&gt;=:minscore&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> conn.<span class="me1">execute</span><span class="br0">&#40;</span>search_string,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">dict</span><span class="br0">&#40;</span>minscore=minscore<span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">finally</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">SessionClass</span>.<span class="me1">remove</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<h3>Speedup</h3>
<p>Just for fun, I compared the speed of (1) using custom functions in SQLite with (2) keeping the records as a Python array, and getting the matches in pure Python using a list comprehension. I found that the SQLAlchemy version is about 8 times faster. In the test, I created a glossary of 44,732 records using random word pairs, and got the fuzzy substrings for a query sentence.</p>
<table>
<tr>
<th>version</th>
<th>time</th>
</tr>
<tr>
<td>native Python</td>
<td>0.7837 s</td>
</tr>
<tr>
<td>SQLAlchemy</td>
<td>0.0966 s</td>
</tr>
</table>
<p>Since the fuzzy-matching code and database code are written in C, the SQLAlchemy version is probably approaching near-C speeds, with the only slowdown being the overhead of calling them from Python (which is pretty minimal; most of the work is done elsewhere).</p>
<p>More importantly, the SQLAlchemy version easily meets my performance target of a 50,000-record search in 0.25 seconds, while the native Python version falls pretty far short.</p>
<p>Also interestingly, I found that <a href="http://psyco.sourceforge.net/">psyco</a> didn't speed up either version at all, and in fact made both slightly slower. Another demonstration that you should profile rather than applying psyco as a panacea.</p>
<p>Here's the <a href="/code/speed_test.gz">code used for the test</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/08/10/using-custom-functions-with-sqlalchemy-and-sqlite/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Book Review: CherryPy Essentials</title>
		<link>http://ginstrom.com/scribbles/2008/08/05/book-review-cherrypy-essentials/</link>
		<comments>http://ginstrom.com/scribbles/2008/08/05/book-review-cherrypy-essentials/#comments</comments>
		<pubDate>Tue, 05 Aug 2008 05:12:39 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/08/05/book-review-cherrypy-essentials/</guid>
		<description><![CDATA[


I recently created a server application to share Felix memories/glossaries over a local network. After a simple test application, I was confident that CherryPy would serve my needs, so I bought CherryPy Essentials (Sylvain Hellgegouarch) and started hacking.

The story of a website
The book uses a practical-minded, show-me-the-code style that I personally like. The book tells [...]]]></description>
			<content:encoded><![CDATA[<div style="float:left; margin-right: 10px; margin-bottom: 10px">
<a href="http://www.cherrypyessentials.com/" style="border:none;" title="CherryPy Essential website"><img src="/img/cherrypyessentialscover.jpg" alt="CherryPy Essentials book cover" border="0" /></a>
</div>
<p>I recently created a <a href="http://felix-cat.com/tools/memory-serves/">server application</a> to share <a href="http://felix-cat.com/">Felix</a> memories/glossaries over a local network. After a <a href="http://felix-cat.com/tools/wordcount/">simple test application</a>, I was confident that <a href="http://www.cherrypy.org/">CherryPy</a> would serve my needs, so I bought <a href="http://www.cherrypyessentials.com/">CherryPy Essentials</a> (Sylvain Hellgegouarch) and started hacking.</p>
<p><br clear="all" /></p>
<h3>The story of a website</h3>
<p>The book uses a practical-minded, show-me-the-code style that I personally like. The book tells the story of a website, starting with an introduction to CherryPy, then walking us through the creation of a sophisticated Web 2.0 website, adding more elements as our understanding grows.</p>
<p>While I found the style of taking us through the development of a website useful, this approach has an obvious weakness: if you don't use the same components as the author, the book will be less relevant. For example, the author chooses <a href="http://www.aminus.net/dejavu">Dejavu</a> for his <a href="http://en.wikipedia.org/wiki/Object-relational_mapping">ORM</a>, while I prefer <a href="http://www.sqlalchemy.org/">SQLAlchemy</a>; he chooses <a href="http://www.kid-templating.org/">Kid</a> for his templating engine, while I prefer <a href="http://www.makotemplates.org/">Mako</a>; and he chooses <a href="http://mochikit.com/">MochiKit</a> for his JavaScript library, while I prefer <a href="http://jquery.com/">jquery</a>.</p>
<p>While this prevented me from using the book's code whole cloth, it did show me how to build a website the CherryPy way, using CherryPy's fantastic features and integrating them with the other technologies used in a modern Web application. In all I'm quite pleased with the value I got out of the book.</p>
<h3>Contents</h3>
<p>Chapter 1: Introduction to CherryPy<br />
Chapter 2: Download and Install CherryPy<br />
Chapter 3: Overview of CherryPy<br />
Chapter 4: CherryPy in Depth<br />
Chapter 5: A Photoblog Application<br />
Chapter 6: Web Services<br />
Chapter 7: The Presentation Layer<br />
Chapter 8: Ajax<br />
Chapter 9: Testing<br />
Chapter 10: Deployment</p>
<p><strong>Chapter 1: Introduction to CherryPy</strong></p>
<p>This chapter provides some introduction to CherryPy's history and community. It can be safely skipped by the impatient.</p>
<p><strong>Chapter 2: Download and Install CherryPy</strong></p>
<p>Not too much of interest here either. If you can find the book, I'm sure you can find CherryPy.</p>
<p><strong>Chapter 3: Overview of CherryPy</strong></p>
<p>Here's where things start getting interesting. We start with a simple "shout" style application, then walk through the anatomy of a basic CherryPy application: configuration, static files, URL routing, and page handlers.</p>
<p>The author also provides a list of the built-in library modules, which I found to be fairly useless since he doesn't tell us how to use them or even really much about what they might be good for.</p>
<p><strong>Chapter 4: CherryPy in Depth</strong></p>
<p>True to its title, this chapter talks in depth about the various capabilities and tools provided with CherryPy, including material on extending and hooking the CherryPy engine. If you've got experience developing websites, chapters 3 and 4 are really all you need to get cracking on a cool website with all the fixings.</p>
<p><strong>Chapter 5: A Photoblog Application</strong></p>
<p>For the remainder of the book, we'll be building a photo blogging application, fully Web 2.0 buzzword compliant with web services, JavaScript, and AJAX goodness.</p>
<p>The chapter provides a high-level outline of the application we'll be building, and then discusses some of the various ORMs available for Python.</p>
<p><strong>Chapter 6: Web Services</strong></p>
<p>This cool chapter describes how to build Web services (including RESTful services) using CherryPy. This served as a major inspiration for creating the API to my server, although I decided not to use REST.</p>
<p><strong>Chapter 7: The Presentation Layer</strong></p>
<p>This chapter goes over templating libraries for Python (the only thing more numerous in Python than templating libraries are Web frameworks) and how to integrate them with CherryPy, then settles on Kid as the engine to use for the application.</p>
<p><strong>Chapter 8: Ajax</strong></p>
<p>Although the author uses MochiKit as his JavaScript library, he describes the AJAX communication process at a fairly low level, so it was easy to transfer the concepts over to jquery. After briefly describing JSON as the mechanism for transferring data between JavaScript and Python, the chapter goes on to describe how we'll be using AJAX in our photo blogging application.</p>
<p><strong>Chapter 9: Testing</strong></p>
<p>You've got to give props to a framework book that devotes a chapter to testing. As with other components, although I prefer <a href="http://www.somethingaboutorange.com/mrl/projects/nose/">nose</a> and <a href="http://wwwsearch.sourceforge.net/mechanize/">mechanize</a> for unit/web testing to the author's <a href="http://docs.python.org/lib/module-unittest.html">unittest</a> and <a href="http://pythonpaste.org/webtest/">webtest</a>, the concepts were what mattered and I had no trouble following along.</p>
<p>I also appreciated the section on performance/load testing using <a href="http://funkload.nuxeo.org/">FunkLoad</a>, as well as function testing with <a href="http://selenium.openqa.org/">Selenium</a>, which I wasn't very familiar with.</p>
<p><strong>Chapter 10: Deployment</strong></p>
<p>This chapter is also pure gold in a framework book. It discusses the various deployment and configuration options, including deploying behind Apache and lighttpd. I found the section on supporting SSL to be particularly helpful, as my next step for my server will be hosting it myself via SSL.</p>
<h3>Conclusion</h3>
<p>This book is a great practical guide to building websites with CherryPy, using best practices for the framework and getting the most out of the huge array of functionality that it provides.</p>
<h3>The Pakt Publishing Ebook</h3>
<p>I bought this book as a PDF from <a href="http://www.packtpub.com/">Pakt Publishing</a>. I prefer having programming books in electronic format, because I typically read them at my computer while experimenting with the code in the book. And since I live in Japan, the electronic format is also a great way to get my grubby hands on technical books quickly and cheaply.</p>
<p>Pakt ebooks, I found, have an annoying security feature: copy and paste is disabled, thwarting my preferred tactic of copying and pasting code snippets to run, and terms to google. Sure, the code samples are provided, but that adds a lot of hassle to what should be a simple operation.</p>
<p>I wish Pakt would discontinue this practice. It only hurts paying customers, since the pirates will no doubt have little trouble stripping out that protection and repackaging the book in any format they like.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/08/05/book-review-cherrypy-essentials/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Python is for people who want to program</title>
		<link>http://ginstrom.com/scribbles/2008/07/24/python-is-for-people-who-want-to-program/</link>
		<comments>http://ginstrom.com/scribbles/2008/07/24/python-is-for-people-who-want-to-program/#comments</comments>
		<pubDate>Wed, 23 Jul 2008 22:48:55 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/07/24/python-is-for-people-who-want-to-program/</guid>
		<description><![CDATA[Saw a great quote the other day on comp.lang.python, in response to a troll questioning Python's usefulness in the "real" world:
Python is for people who want to program, not REAL WORLD programmers.
By Mensanator in comp.lang.python (Google groups link)
(Python encourages a sense of fun, and people on the comp.lang.python group tend to like to have fun [...]]]></description>
			<content:encoded><![CDATA[<p>Saw a great quote the other day on <a href="http://groups.google.com/group/comp.lang.python/topics">comp.lang.python</a>, in response to a troll questioning Python's usefulness in the "real" world:</p>
<blockquote><p>Python is for people who want to program, not REAL WORLD programmers.</p></blockquote>
<div style="text-align:right">By Mensanator <a href="http://groups.google.com/group/comp.lang.python/msg/71c069f822528251">in comp.lang.python (Google groups link)</a></div>
<p>(Python encourages a sense of fun, and people on the comp.lang.python group tend to like to have fun taking the piss out of trolls.)</p>
<p>Not that "real world" programmers don't use Python &#8212; just that people who place a lot of value on being a "real world" programmer are probably using Java or C# or something. Meanwhile, people use Python because it's a great language and they love programming in it, not because it will make people think they're "real world" programmers.</p>
<p>This goes back to the <a href="http://www.paulgraham.com/pypar.html">Python Paradox</a> described by Paul Graham: python programmers are generally good hires, not because Python is (necessarily) a better language than Java or C#, but because</p>
<blockquote><p>&#8230;people don't learn Python because it will get them a job; they learn it because they genuinely like to program and aren't satisfied with the languages they already know.</p></blockquote>
<p>This is becoming less of a sure thing as Python gains popularity and starts to look good on "real world" resumes, but I think it's still true that people use Python because they like it.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/07/24/python-is-for-people-who-want-to-program/feed/</wfw:commentRss>
		</item>
		<item>
		<title>The mysterious &#8220;ImportError: cannot import name cache&#8221;</title>
		<link>http://ginstrom.com/scribbles/2008/06/17/the-mysterious-importerror-cannot-import-name-cache/</link>
		<comments>http://ginstrom.com/scribbles/2008/06/17/the-mysterious-importerror-cannot-import-name-cache/#comments</comments>
		<pubDate>Tue, 17 Jun 2008 10:47:03 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/06/17/the-mysterious-importerror-cannot-import-name-cache/</guid>
		<description><![CDATA[The scenario: I'm packaging a CherryPy server with py2exe, using Mako as my template engine. So I create my exe file, and fire up the app, and get this 500 Mako error:

Traceback (most recent call last):
&#160;&#160;File "cherrypy\_cprequest.pyo", line 551, in respond
&#160;&#160;File "cherrypy\_cpdispatch.pyo", line 24, in __call__
&#160;&#160;File "main.py", line 210, in index
&#160;&#160;File "mako\lookup.pyo", line 70, in [...]]]></description>
			<content:encoded><![CDATA[<p>The scenario: I'm packaging a <a href="http://www.cherrypy.org/">CherryPy</a> server with <a href="http://www.py2exe.org/">py2exe</a>, using <a href="http://www.makotemplates.org/">Mako</a> as my template engine. So I create my exe file, and fire up the app, and get this 500 Mako error:<br />
<span style="color:red"><br />
Traceback (most recent call last):<br />
&nbsp;&nbsp;File "cherrypy\_cprequest.pyo", line 551, in respond<br />
&nbsp;&nbsp;File "cherrypy\_cpdispatch.pyo", line 24, in __call__<br />
&nbsp;&nbsp;File "main.py", line 210, in index<br />
&nbsp;&nbsp;File "mako\lookup.pyo", line 70, in get_template<br />
&nbsp;&nbsp;File "mako\lookup.pyo", line 112, in __load<br />
&nbsp;&nbsp;File "mako\template.pyo", line 74, in __init__<br />
&nbsp;&nbsp;File "&#8230;", line 1, in &lt;module&gt;<br />
ImportError: cannot import name cache</span></p>
<p>Since the app was working fine without py2exe, I brilliantly deduced that maybe something was getting broken in the packaging process.</p>
<p>It turns out that I need to include "mako.cache" in my packages, <a href="http://koobmeej.blogspot.com/2008/03/cherrypy-mako-and-py2exe.html">as pointed out by this blog post</a>.</p>
<p>So the relevant section of my setup dictionary now looks like this:</p>
<div class="dean_ch" style="white-space: wrap;">
&nbsp; &nbsp; excludes = <span class="br0">&#91;</span><span class="st0">&quot;pywin&quot;</span>, <span class="st0">&quot;pywin.debugger&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;pywin.debugger.dbgcon&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;pywin.dialogs&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;pywin.dialogs.list&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;win32com.server&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;Tkinter&quot;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; packages=<span class="br0">&#91;</span><span class="st0">&quot;email&quot;</span>, <span class="st0">&quot;lxml&quot;</span>, <span class="st0">&quot;mako.cache&quot;</span><span class="br0">&#93;</span></p>
<p>&nbsp; &nbsp; options = <span class="kw2">dict</span><span class="br0">&#40;</span>optimize=<span class="nu0">2</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dist_dir=comp_name,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;excludes=excludes,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;packages=packages<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; setup_dict<span class="br0">&#91;</span><span class="st0">'options'</span><span class="br0">&#93;</span> = <span class="br0">&#123;</span><span class="st0">&quot;py2exe&quot;</span>:options<span class="br0">&#125;</span></div>
<p>Everything's working now &#8212; lovely stuff!</p>
<p>Googling on this error failed to produce any hits, so I had to (<em>gasp</em>!) do some actual research to figure this out. Here's hoping that I can save the next poor soul from that horror.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/06/17/the-mysterious-importerror-cannot-import-name-cache/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Counting words (etc.) in an HTML file with Python</title>
		<link>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/</link>
		<comments>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/#comments</comments>
		<pubDate>Sat, 17 May 2008 00:50:38 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/</guid>
		<description><![CDATA[In a previous post, I wrote about how to count words, characters, and Asian characters using python.
In this post I want to pull that together with code to get a word count from an HTML file.
What needs counting
What needs counting depends to some extent on what you need the word count for, but here I'm [...]]]></description>
			<content:encoded><![CDATA[<p>In a previous post, I wrote about <a href="/scribbles/2007/10/06/counting-words-characters-and-asian-characters-with-python/">how to count words, characters, and Asian characters using python</a>.</p>
<p>In this post I want to pull that together with code to get a word count from an HTML file.</p>
<h2>What needs counting</h2>
<p>What needs counting depends to some extent on what you need the word count for, but here I'm going to be assuming that the word count is going to be used to count billable/localizable content.</p>
<p>In that scenario, you've got to count the text in the title tag, as well as the visible text in the body, and certain other localizable content: <code>img</code> <code>alt</code> attributes, <code>a</code> <code>title</code> attributes, and <code>input</code> <code>value</code> attributes (am I missing any?).</p>
<h2>The Code</h2>
<p>The code for counting the actual text is in the above link. Here we need code to extract the text from the HTML file, and to accumulate the counts for all the chunks we've extracted.</p>
<p>Here's the Segment class for accumulating counts:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">class</span> Segment<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Represents a text segment.<br />
&nbsp; &nbsp; (For bookkeeping)<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, text=<span class="st0">&quot;&quot;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot; text is the segment of text we will calculate.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Leave it empty if this will be a master count for a document</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; @param text: The text of the segment<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">characters</span> = <span class="kw2">len</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; num_spaces = <span class="kw2">len</span><span class="br0">&#40;</span><span class="br0">&#91;</span>x <span class="kw1">for</span> x <span class="kw1">in</span> text <span class="kw1">if</span> x.<span class="me1">isspace</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">chars_no_spaces</span> = <span class="kw2">self</span>.<span class="me1">characters</span> - num_spaces</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">asian_chars</span> = <span class="kw2">len</span><span class="br0">&#40;</span><span class="br0">&#91;</span>x <span class="kw1">for</span> x <span class="kw1">in</span> text <span class="kw1">if</span> is_asian<span class="br0">&#40;</span>x<span class="br0">&#41;</span><span class="br0">&#93;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">non_asian_words</span> = non_j_len<span class="br0">&#40;</span>text<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">words</span> = <span class="kw2">self</span>.<span class="me1">non_asian_words</span> + <span class="kw2">self</span>.<span class="me1">asian_chars</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> accumulate<span class="br0">&#40;</span><span class="kw2">self</span>, seg<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Add the stats from &lt;seg&gt; to this one.<br />
&nbsp; &nbsp; &nbsp; &nbsp; Use this to keep a count for the entire document;<br />
&nbsp; &nbsp; &nbsp; &nbsp; use another for the whole batch of documents</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; @param seg: The segment to accumulate</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg = Segment(u&quot;</span><span class="st0">&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg2 = Segment(u&quot;</span>abc<span class="st0">&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.accumulate(seg2)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.words<br />
&nbsp; &nbsp; &nbsp; &nbsp; 1<br />
&nbsp; &nbsp; &nbsp; &nbsp; &gt;&gt;&gt; seg.characters<br />
&nbsp; &nbsp; &nbsp; &nbsp; 3<br />
&nbsp; &nbsp; &nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">words</span> += seg.<span class="me1">words</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">characters</span> += seg.<span class="me1">characters</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">chars_no_spaces</span> += seg.<span class="me1">chars_no_spaces</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">asian_chars</span> += seg.<span class="me1">asian_chars</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">non_asian_words</span> += seg.<span class="me1">non_asian_words</span></div>
<p>Next, the code for extracting (segmenting) the text from an HTML file. For this, you'll need <a href="http://www.crummy.com/software/BeautifulSoup/">the excellent Beautiful Soup module</a>.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="co1">#coding: UTF8</span><br />
<span class="st0">&quot;&quot;</span><span class="st0">&quot;Html segmenter&quot;</span><span class="st0">&quot;&quot;</span></p>
<p><span class="kw1">from</span> BeautifulSoup <span class="kw1">import</span> BeautifulSoup as bsoup<br />
<span class="kw1">from</span> BeautifulSoup <span class="kw1">import</span> BeautifulStoneSoup<br />
<span class="kw1">import</span> <span class="kw3">re</span></p>
<p><span class="kw1">def</span> normalize<span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Normalize whitepace in C{text}.</p>
<p>&nbsp; &nbsp; &gt;&gt;&gt; normalize(u&quot;</span> &nbsp; spam\\n\\tspam &nbsp; SPAM<span class="st0">&quot;)<br />
&nbsp; &nbsp; u'spam spam SPAM'<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> u<span class="st0">' '</span>.<span class="me1">join</span><span class="br0">&#40;</span>text.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">class</span> Segmenter<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Html segmenter<br />
&nbsp; &nbsp; Retrieves the editable/translatable text from an HTML document.<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Set up various regular expressions for splitting the text&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">pre_parse_stripper</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">&quot;|&quot;</span>.<span class="me1">join</span><span class="br0">&#40;</span><span class="br0">&#91;</span>u<span class="st0">&quot;&lt;body*?&gt;|&lt;/body&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;a[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/a&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;img[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/img&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;input[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/input&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;script*?&gt;[<span class="es0">\s</span><span class="es0">\S</span>]*?&lt;/script&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;form[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;|&lt;/form&gt;&quot;</span><span class="br0">&#93;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Strip out unsightly tags before heading to the splitter&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">splitter</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">'|'</span>.<span class="me1">join</span><span class="br0">&#40;</span><span class="br0">&#91;</span>u<span class="st0">&quot;&lt;p*?&gt;|&lt;/p&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;div*?&gt;|&lt;/div&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;td*?&gt;|&lt;/td&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;li*?&gt;|&lt;/li&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;h<span class="es0">\d</span>*?&gt;|&lt;/h<span class="es0">\d</span>&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;dd*?&gt;|&lt;/dd&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;dt*?&gt;|&lt;/dt&gt;&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;u<span class="st0">&quot;&lt;br*?&gt;&quot;</span><span class="br0">&#93;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Split segments by certain tags (removing tags in bargain)<br />
&nbsp; &nbsp; &nbsp; &nbsp; These tags indicate a segment boundary&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">charset_finder</span> = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">'[<span class="es0">\s</span><span class="es0">\S</span>]*&lt;meta[<span class="es0">\s</span><span class="es0">\S</span>]*?charset<span class="es0">\s</span>*=<span class="es0">\s</span>*([<span class="es0">\S</span>]+)&quot;[<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;[<span class="es0">\s</span><span class="es0">\S</span>]*'</span>, <span class="kw3">re</span>.<span class="me1">I</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Find the charset if necessary&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">soup</span> = <span class="kw2">None</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__str__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;So we can tell which segger we have (assuming multiple segmenter classes)&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="st0">&quot;HTML&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> get_chunks<span class="br0">&#40;</span><span class="kw2">self</span>, html_text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Extract the text from the HTML file&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">soup</span> = bsoup<span class="br0">&#40;</span>html_text, fromEncoding=<span class="kw2">self</span>.<span class="me1">getEncoding</span><span class="br0">&#40;</span>html_text<span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># document title</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">head</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; title = <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">head</span>.<span class="me1">title</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> title:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> title.<span class="kw3">string</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># image alt attributes, anchor title attributes, input value attributes</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> tag, attr <span class="kw1">in</span> <span class="br0">&#40;</span><span class="br0">&#40;</span>u<span class="st0">&quot;img&quot;</span>, u<span class="st0">&quot;alt&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>u<span class="st0">&quot;a&quot;</span>, u<span class="st0">&quot;title&quot;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span>u<span class="st0">&quot;input&quot;</span>, u<span class="st0">&quot;value&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> <span class="kw2">self</span>.<span class="me1">getAttributes</span><span class="br0">&#40;</span>tag, attr<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw3">chunk</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> <span class="kw3">chunk</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># Parse the body text</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">body</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; text = <span class="kw2">self</span>.<span class="me1">pre_parse_stripper</span>.<span class="me1">sub</span><span class="br0">&#40;</span>u<span class="st0">&quot;&quot;</span>, <span class="kw2">unicode</span><span class="br0">&#40;</span><span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">body</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> <span class="kw2">self</span>.<span class="me1">splitter</span>.<span class="me1">split</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; normal = normalize<span class="br0">&#40;</span>html2plain<span class="br0">&#40;</span><span class="kw3">chunk</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> normal:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">yield</span> normal</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> getAttributes<span class="br0">&#40;</span><span class="kw2">self</span>, tagName, attrName<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get all attrName values for tagName tags&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; attrs = <span class="br0">&#91;</span><span class="br0">&#93;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; tags = <span class="kw2">self</span>.<span class="me1">soup</span>.<span class="me1">findAll</span><span class="br0">&#40;</span>tagName<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> tag <span class="kw1">in</span> tags:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">try</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; attr = tag<span class="br0">&#91;</span>attrName<span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> attr:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; attrs.<span class="me1">append</span><span class="br0">&#40;</span>attr<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">except</span> <span class="kw2">KeyError</span>, e:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">#print &quot;Tag %s does not have attribute %s&quot; % (tagName, attrName)</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">pass</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> attrs</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> getEncoding<span class="br0">&#40;</span><span class="kw2">self</span>, text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Retrieve the encoding META tag, if present&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; m = <span class="kw2">self</span>.<span class="me1">charset_finder</span>.<span class="me1">match</span><span class="br0">&#40;</span>text<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> m:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> m.<span class="me1">groups</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">0</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">None</span></p>
<p>
TAG_STRIPPER = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>u<span class="st0">&quot;&lt;[!<span class="es0">\w</span>/][<span class="es0">\s</span><span class="es0">\S</span>]*?&gt;&quot;</span>, <span class="kw3">re</span>.<span class="me1">I</span> | <span class="kw3">re</span>.<span class="me1">M</span><span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> strip_tags<span class="br0">&#40;</span>line<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;strip the HTML tags from the line</p>
<p>&nbsp; &nbsp; &gt;&gt;&gt; strip_tags(u&quot;</span>&lt;b&gt;spam&lt;/b&gt;<span class="st0">&quot;)<br />
&nbsp; &nbsp; u'spam'</p>
<p>&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> TAG_STRIPPER.<span class="me1">sub</span><span class="br0">&#40;</span>u<span class="st0">&quot;&quot;</span>, line<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> html2plain<span class="br0">&#40;</span>text<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Strips out tags from HTML text</p>
<p>&nbsp; &nbsp; &gt;&gt;&gt; html2plain('spam &lt;b&gt;eggs&lt;/b&gt;')<br />
&nbsp; &nbsp; u'spam<span class="es0">\\</span>xa0eggs'<br />
&nbsp; &nbsp; &gt;&gt;&gt; html2plain('&#8211;&gt;')<br />
&nbsp; &nbsp; u'&#8211;&gt;'<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; entities = BeautifulStoneSoup.<span class="me1">HTML_ENTITIES</span><br />
&nbsp; &nbsp; text = <span class="kw2">unicode</span><span class="br0">&#40;</span>BeautifulStoneSoup<span class="br0">&#40;</span>strip_tags<span class="br0">&#40;</span>text<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; convertEntities=entities<span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> text.<span class="me1">replace</span><span class="br0">&#40;</span>u<span class="st0">&quot;&amp;#38;gt;&quot;</span>, <span class="st0">&quot;&gt;&quot;</span><span class="br0">&#41;</span>.<span class="me1">replace</span><span class="br0">&#40;</span>u<span class="st0">&quot;&amp;#38;lt;&quot;</span>, <span class="st0">&quot;&lt;&quot;</span><span class="br0">&#41;</span></div>
<p>And here's some code to get the actual wordcount:</p>
<div class="dean_ch" style="white-space: wrap;">
&nbsp; &nbsp; wordcount = docstats.<span class="me1">Segment</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; segger = htmlseg.<span class="me1">Segmenter</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> segger.<span class="me1">get_chunks</span><span class="br0">&#40;</span><span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;thefile.html&quot;</span><span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; wordcount.<span class="me1">accumulate</span><span class="br0">&#40;</span>docstats.<span class="me1">Segment</span><span class="br0">&#40;</span><span class="kw3">chunk</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<p>Here are the <a href="/code/html_wordcount.tar.gz">docstats and htmlseg modules</a>, and here is an <a href="http://felix-cat.com/tools/wordcount/">online tool using the code for the HTML word counts</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/05/17/counting-words-etc-in-an-html-file-with-python/feed/</wfw:commentRss>
		</item>
		<item>
		<title>What price elegance?</title>
		<link>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/#comments</comments>
		<pubDate>Fri, 21 Mar 2008 04:03:03 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/21/what-price-elegance/</guid>
		<description><![CDATA[In a recent post, I gave some code for counting the top n most frequent words in an arbitrary text file using itertools.groupby.
The code is written in a somewhat functional style. It's short and, dare I say, kind of elegant. But it turns out that this code is quite a bit slower than an imperative [...]]]></description>
			<content:encoded><![CDATA[<p><a href="/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/">In a recent post</a>, I gave some code for counting the top n most frequent words in an arbitrary text file using <a href="http://docs.python.org/lib/itertools-functions.html#l2h-1064">itertools.groupby.</a></p>
<p>The code is written in a somewhat functional style. It's short and, dare I say, kind of elegant. But it turns out that this code is quite a bit slower than an imperative style using <a href="http://docs.python.org/lib/defaultdict-objects.html">collections.defaultdict</a>.</p>
<p>Here are the two functions:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
<span class="kw1">from</span> <span class="kw3">collections</span> <span class="kw1">import</span> defaultdict</p>
<p><span class="kw1">def</span> get_top_freqs_gb<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using itertools.groupby<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> get_top_freqs_dd<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using collections.defaultdict<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freq_dict = defaultdict<span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> word <span class="kw1">in</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; freq_dict<span class="br0">&#91;</span>word<span class="br0">&#93;</span> += <span class="nu0">1</span><br />
&nbsp; &nbsp; freqs =<span class="br0">&#91;</span><span class="br0">&#40;</span>v, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, v <span class="kw1">in</span> freq_dict.<span class="me1">iteritems</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></div>
<p>Here are the helper functions:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">import</span> <span class="kw3">re</span></p>
<p><span class="kw1">def</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the words from filename&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; split = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>r<span class="st0">&quot;<span class="es0">\b</span><span class="es0">\w</span>+<span class="es0">\b</span>&quot;</span><span class="br0">&#41;</span>.<span class="me1">findall</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span>word<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> line <span class="kw1">in</span> <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> word <span class="kw1">in</span> split<span class="br0">&#40;</span>line.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></div>
<p>The groupby version is shorter than the defaultdict version, and I'd say that it's simpler and more readable as well. Because it's shorter, the groupby version is less likely to contain bugs. In particular, the defaultdict version has a mutable local variable (used as an accumulator in the for loop), which is a classic source of bugs. The groupby version is also likely to be easier to maintain because it's shorter and simpler.</p>
<p>But the defaultdict version of the function winds up being considerably faster.</p>
<p>The times it took to run these functions 10 times on my computer, retrieving the top 50 most frequent words for "/python25/readme.txt", are as follows (seconds rounded to 4 decimal places).</p>
<table>
<tr>
<th>&nbsp;</th>
<th>Without psyco</th>
<th>With psyco</th>
</tr>
<tr>
<th align="left">groupby version</th>
<td align="center"><font color="red">0.3133 s</font></td>
<td align="center"><font color="red">0.2193 s</font></td>
</tr>
<tr>
<th align="left">defaultdict version</th>
<td align="center"><font color="green">0.2852 s</font></td>
<td align="center"><font color="green">0.1818 s</font></td>
</tr>
<tr>
<th align="left">groupby / defaultdict</th>
<td align="center">1.41</td>
<td align="center">1.58</td>
</tr>
</table>
<p>The defaultdict version is 1.4x faster than the groupby version. This gap grows even further when psyco is used, making the defaultdict version nearly 1.6x as fast. I'd say that most of the reason for the slowness is that the groupby version of the function performs two sorts, compared to one sort in the defaultdict version.</p>
<p>(The psyco speedup for the defaultdict version comes from the for loop; changing <code>get_words</code> to return a generator expression eliminates the speedup. The speedup for the groupby version comes from the <code>freq</code> <a href="http://docs.python.org/tut/node7.html#SECTION007140000000000000000">list comprehension</a>; changing this to a generator expression eliminates its speedup.)</p>
<h3>So which one should I use?</h3>
<p>It's pretty common for Python code written in a functional style to be slower than equivalent code written in an imperative style. Nevertheless, I tend to prefer the more functional style of programming, switching to a more imperative style (or <a href="/scribbles/2007/12/02/extending-python-with-c-a-case-study/">other forms of optimization</a>) if performance isn't satisfactory.</p>
<blockquote><p>It is easier to optimize correct code, than correct optimized code.
</p></blockquote>
<p align="right"><em>&#8211;Yves Deville</em></p>
<p>A big question here is how to tell if the functional version is fast enough. My general rule of thumb is that the user would be prepared to wait up to two seconds for a typical "grovel through these files and tell me something interesting" command that's performed infrequently (how frequently do you need to get word frequencies from files?). For a more common action, the wait time should be under a second, with < .5 seconds being optimal (this includes GUI responsiveness but not Web page loading).</p>
<p>Given the times above, and assuming that the user will search no more than 50 files of sizes comparable to <a href="http://svn.python.org/view/python/branches/release25-maint/README?rev=59483">Python's README file</a>, then either version of the function is sufficient. If we assume that the user will search up to 100 files, or files substantially larger than the README file, then only the imperative version is acceptable (and we may need to optimize this further if our demands are higher than this).</p>
<p>That's why it's so important to profile and test Python programs from the very beginning. I keep a suite of test cases that I profile with every build (performed at least daily), noting trends in performance and optimizing when the code has gelled and bottlenecks remain.</p>
<p>Here is the test code:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">from</span> <span class="kw3">time</span> <span class="kw1">import</span> clock</p>
<p><span class="kw1">def</span> time_func<span class="br0">&#40;</span>func, iterations, *args, **kwargs<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Return the time it takes to execute func<br />
&nbsp; &nbsp; itertations times.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; start = clock<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">xrange</span><span class="br0">&#40;</span>iterations<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; func<span class="br0">&#40;</span>*args, **kwargs<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> clock<span class="br0">&#40;</span><span class="br0">&#41;</span> - start</p>
<p><span class="kw1">def</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; filename = <span class="st0">&quot;/python25/readme.txt&quot;</span><br />
&nbsp; &nbsp; top_gb = get_top_freqs_gb<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; top_dd = get_top_freqs_dd<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">assert</span> top_gb == top_dd</p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;With psyco&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">import</span> psyco<br />
&nbsp; &nbsp; psyco.<span class="me1">full</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; main<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<p>The whole shebang:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="co1">#coding: UTF8</span><br />
<span class="st0">&quot;&quot;</span><span class="st0">&quot;<br />
Testing functional programming stuff<br />
&quot;</span><span class="st0">&quot;&quot;</span></p>
<p><span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
<span class="kw1">from</span> <span class="kw3">collections</span> <span class="kw1">import</span> defaultdict<br />
<span class="kw1">import</span> <span class="kw3">re</span><br />
<span class="kw1">from</span> <span class="kw3">time</span> <span class="kw1">import</span> clock</p>
<p><span class="kw1">def</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the words from filename&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; split = <span class="kw3">re</span>.<span class="kw2">compile</span><span class="br0">&#40;</span>r<span class="st0">&quot;<span class="es0">\b</span><span class="es0">\w</span>+<span class="es0">\b</span>&quot;</span><span class="br0">&#41;</span>.<span class="me1">findall</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span>word<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> line <span class="kw1">in</span> <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> word <span class="kw1">in</span> split<span class="br0">&#40;</span>line.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> get_top_freqs_gb<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using itertools.groupby<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> get_top_freqs_dd<span class="br0">&#40;</span>filename, num<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples, using collections.defaultdict<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; freq_dict = defaultdict<span class="br0">&#40;</span><span class="kw2">int</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> word <span class="kw1">in</span> get_words<span class="br0">&#40;</span>filename<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; freq_dict<span class="br0">&#91;</span>word<span class="br0">&#93;</span> += <span class="nu0">1</span><br />
&nbsp; &nbsp; freqs =<span class="br0">&#91;</span><span class="br0">&#40;</span>v, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> k, v <span class="kw1">in</span> freq_dict.<span class="me1">iteritems</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> get_top<span class="br0">&#40;</span>freqs, num<span class="br0">&#41;</span></p>
<p><span class="kw1">def</span> time_func<span class="br0">&#40;</span>func, iterations, *args, **kwargs<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Return the time it takes to execute func<br />
&nbsp; &nbsp; itertations times.&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; start = clock<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">xrange</span><span class="br0">&#40;</span>iterations<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; func<span class="br0">&#40;</span>*args, **kwargs<span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> clock<span class="br0">&#40;</span><span class="br0">&#41;</span> - start</p>
<p><span class="kw1">def</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; filename = <span class="st0">&quot;/python25/readme.txt&quot;</span><br />
&nbsp; &nbsp; top_gb = get_top_freqs_gb<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; top_dd = get_top_freqs_dd<span class="br0">&#40;</span>filename, <span class="nu0">100</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">assert</span> top_gb == top_dd</p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;With psyco&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">import</span> psyco<br />
&nbsp; &nbsp; psyco.<span class="me1">full</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> func <span class="kw1">in</span> <span class="br0">&#91;</span>get_top_freqs_gb, get_top_freqs_dd<span class="br0">&#93;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; name = func.__name__<br />
&nbsp; &nbsp; &nbsp; &nbsp; seconds = time_func<span class="br0">&#40;</span>func, <span class="nu0">10</span>, filename, <span class="nu0">50</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span>name, seconds<span class="br0">&#41;</span></p>
<p><span class="kw1">if</span> __name__ == <span class="st0">&quot;__main__&quot;</span>:<br />
&nbsp; &nbsp; main<span class="br0">&#40;</span><span class="br0">&#41;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/21/what-price-elegance/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Counting occurrences in a sequence with itertools.groupby</title>
		<link>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 05:29:38 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/</guid>
		<description><![CDATA[itertools.groupby is a great tool for counting the numbers of occurrences in a sequence.
Here are some examples from the interactive interpreter.
A list of numbers

&#62;&#62;&#62; # Create a random list of numbers
&#62;&#62;&#62; from random import random
&#62;&#62;&#62; numbers = &#91;int&#40;random&#40;&#41; * 10&#41; for x in range&#40;20&#41;&#93;
&#62;&#62;&#62; numbers
&#91;8, 0, 3, 2, 3, 9, 8, 2, 8, 3, 0, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://docs.python.org/lib/itertools-functions.html#l2h-1064">itertools.groupby</a> is a great tool for counting the numbers of occurrences in a sequence.</p>
<p>Here are some examples from the interactive interpreter.</p>
<h3>A list of numbers</h3>
<div class="dean_ch" style="white-space: wrap;">
&gt;&gt;&gt; <span class="co1"># Create a random list of numbers</span><br />
&gt;&gt;&gt; <span class="kw1">from</span> <span class="kw3">random</span> <span class="kw1">import</span> <span class="kw3">random</span><br />
&gt;&gt;&gt; numbers = <span class="br0">&#91;</span><span class="kw2">int</span><span class="br0">&#40;</span><span class="kw3">random</span><span class="br0">&#40;</span><span class="br0">&#41;</span> * <span class="nu0">10</span><span class="br0">&#41;</span> <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">range</span><span class="br0">&#40;</span><span class="nu0">20</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; numbers<br />
<span class="br0">&#91;</span><span class="nu0">8</span>, <span class="nu0">0</span>, <span class="nu0">3</span>, <span class="nu0">2</span>, <span class="nu0">3</span>, <span class="nu0">9</span>, <span class="nu0">8</span>, <span class="nu0">2</span>, <span class="nu0">8</span>, <span class="nu0">3</span>, <span class="nu0">0</span>, <span class="nu0">2</span>, <span class="nu0">3</span>, <span class="nu0">8</span>, <span class="nu0">6</span>, <span class="nu0">5</span>, <span class="nu0">3</span>, <span class="nu0">6</span>, <span class="nu0">1</span>, <span class="nu0">8</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># Now create a dictionary of numbers and numbers</span><br />
&gt;&gt;&gt; <span class="co1"># of occurrences. Feed generator expression of</span><br />
&gt;&gt;&gt; <span class="co1"># (number, frequency) pairs to dict().</span><br />
&gt;&gt;&gt; <span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby<br />
&gt;&gt;&gt; valdict = <span class="kw2">dict</span><span class="br0">&#40;</span><span class="br0">&#40;</span>k, <span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>numbers<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; <span class="kw1">for</span> key, val <span class="kw1">in</span> valdict.<span class="me1">items</span><span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">print</span> key, <span class="st0">&quot;:&quot;</span>, val</p>
<p>
<span class="nu0">0</span> : <span class="nu0">2</span><br />
<span class="nu0">1</span> : <span class="nu0">1</span><br />
<span class="nu0">2</span> : <span class="nu0">3</span><br />
<span class="nu0">3</span> : <span class="nu0">5</span><br />
<span class="nu0">5</span> : <span class="nu0">1</span><br />
<span class="nu0">6</span> : <span class="nu0">2</span><br />
<span class="nu0">8</span> : <span class="nu0">5</span><br />
<span class="nu0">9</span> : <span class="nu0">1</span></div>
<p>And a function that does this for any iterable:</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby</p>
<p><span class="kw1">def</span> count_occurrences<span class="br0">&#40;</span>iterable<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;return a dictionary with items and numbers of occurrences<br />
&nbsp; &nbsp; in iterable&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">dict</span><span class="br0">&#40;</span><span class="br0">&#40;</span>item, <span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>group<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> item, group<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>iterable<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<h3>Top 20 most frequent words in a file</h3>
<div class="dean_ch" style="white-space: wrap;">
&gt;&gt;&gt; <span class="co1"># get a wordlist from the Python README</span><br />
&gt;&gt;&gt; text = <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;/python25/readme.txt&quot;</span><span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; words = text.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&gt;&gt;&gt; words<span class="br0">&#91;</span>:<span class="nu0">5</span><span class="br0">&#93;</span><br />
<span class="br0">&#91;</span><span class="st0">'this'</span>, <span class="st0">'is'</span>, <span class="st0">'python'</span>, <span class="st0">'version'</span>, <span class="st0">'2.5.2&#8242;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># get the frequency list, using DSU to sort top words</span><br />
&gt;&gt;&gt; freqs = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp;<span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>words<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&gt;&gt;&gt; <span class="co1"># sort the freqs, get last 20, and reverse</span><br />
&gt;&gt;&gt; <span class="co1"># to put most frequent first</span><br />
&gt;&gt;&gt; <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span><span class="nu0">-20</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">print</span> <span class="st0">&quot;%s %s&quot;</span> % <span class="br0">&#40;</span>b.<span class="me1">ljust</span><span class="br0">&#40;</span><span class="nu0">7</span><span class="br0">&#41;</span>, <span class="kw2">str</span><span class="br0">&#40;</span>a<span class="br0">&#41;</span>.<span class="me1">rjust</span><span class="br0">&#40;</span><span class="nu0">3</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>
the &nbsp; &nbsp; <span class="nu0">442</span><br />
to &nbsp; &nbsp; &nbsp;<span class="nu0">227</span><br />
<span class="kw1">is</span> &nbsp; &nbsp; &nbsp;<span class="nu0">127</span><br />
<span class="kw1">and</span> &nbsp; &nbsp; <span class="nu0">127</span><br />
you &nbsp; &nbsp; <span class="nu0">118</span><br />
a &nbsp; &nbsp; &nbsp; <span class="nu0">117</span><br />
of &nbsp; &nbsp; &nbsp;<span class="nu0">110</span><br />
<span class="kw1">in</span> &nbsp; &nbsp; &nbsp;<span class="nu0">107</span><br />
<span class="kw1">for</span> &nbsp; &nbsp; &nbsp;<span class="nu0">94</span><br />
python &nbsp; <span class="nu0">81</span><br />
on &nbsp; &nbsp; &nbsp; <span class="nu0">79</span><br />
<span class="kw1">if</span> &nbsp; &nbsp; &nbsp; <span class="nu0">77</span><br />
this &nbsp; &nbsp; <span class="nu0">72</span><br />
<span class="kw1">or</span> &nbsp; &nbsp; &nbsp; <span class="nu0">62</span><br />
be &nbsp; &nbsp; &nbsp; <span class="nu0">58</span><br />
with &nbsp; &nbsp; <span class="nu0">56</span><br />
it &nbsp; &nbsp; &nbsp; <span class="nu0">53</span><br />
are &nbsp; &nbsp; &nbsp;<span class="nu0">53</span><br />
that &nbsp; &nbsp; <span class="nu0">52</span><br />
as &nbsp; &nbsp; &nbsp; <span class="nu0">47</span></div>
<p>Here's a function that will do this.</p>
<div class="dean_ch" style="white-space: wrap;">
<p><span class="kw1">from</span> <span class="kw3">itertools</span> <span class="kw1">import</span> groupby</p>
<p><span class="kw1">def</span> get_top_freqs<span class="br0">&#40;</span>filename, num=<span class="nu0">20</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Get the top num words from filename as a list<br />
&nbsp; &nbsp; of (word, freq) tuples<br />
&nbsp; &nbsp; &quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; text = <span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; words = text.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; freqs = <span class="br0">&#40;</span><span class="br0">&#40;</span><span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#40;</span>g<span class="br0">&#41;</span><span class="br0">&#41;</span>, k<span class="br0">&#41;</span> <span class="kw1">for</span> k, g <span class="kw1">in</span> groupby<span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>words<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#40;</span>b, a<span class="br0">&#41;</span> <span class="kw1">for</span> a, b <span class="kw1">in</span> <span class="kw2">reversed</span><span class="br0">&#40;</span><span class="kw2">sorted</span><span class="br0">&#40;</span>freqs<span class="br0">&#41;</span><span class="br0">&#91;</span>num*<span class="nu0">-1</span>:<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></div>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/13/counting-occurrences-in-a-sequency-with-itertoolsgroupby/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Using chardet to convert arbitrary byte strings to Unicode</title>
		<link>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/</link>
		<comments>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/#comments</comments>
		<pubDate>Sat, 08 Mar 2008 02:24:36 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/</guid>
		<description><![CDATA[chardet is a fantastic module for finding the encoding of arbitrary byte strings. You can combine this with a check for a BOM to pretty reliably turn them into Unicode.
Edit: Thanks to Kirit's comment below, I added code to check for UTF-32.

import chardet
def bytes2unicode&#40;bytes, errors='replace'&#41;:
&#160; &#160; &#34;&#34;&#34;Convert a byte string into Unicode.
&#160; &#160; First checks [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://chardet.feedparser.org/">chardet</a> is a fantastic module for finding the encoding of arbitrary byte strings. You can combine this with a check for a <a href="http://en.wikipedia.org/wiki/Byte_Order_Mark">BOM</a> to pretty reliably turn them into Unicode.</p>
<p><strong>Edit:</strong> Thanks to Kirit's comment below, I added code to check for UTF-32.</p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">import</span> chardet</p>
<p><span class="kw1">def</span> bytes2unicode<span class="br0">&#40;</span>bytes, errors=<span class="st0">'replace'</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Convert a byte string into Unicode.<br />
&nbsp; &nbsp; First checks for a BOM, and if one is found returns<br />
&nbsp; &nbsp; the Unicode text minus the BOM. If there is no BOM,<br />
&nbsp; &nbsp; falls back to chardet.&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; encoding_map = <span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ef<span class="es0">\x</span>bb<span class="es0">\x</span>bf'</span>, <span class="st0">'utf-8&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ff<span class="es0">\x</span>fe<span class="es0">\0</span><span class="es0">\0</span>'</span>, <span class="st0">'utf-32&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\0</span><span class="es0">\0</span><span class="es0">\x</span>fe<span class="es0">\x</span>ff'</span>, <span class="st0">'UTF-32BE'</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>ff<span class="es0">\x</span>fe'</span>, <span class="st0">'utf-16&#8242;</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; 　　　　<span class="br0">&#40;</span><span class="st0">'<span class="es0">\x</span>fe<span class="es0">\x</span>ff'</span>, <span class="st0">'UTF-16BE'</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">for</span> bom, encoding <span class="kw1">in</span> encoding_map:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> bytes.<span class="me1">startswith</span><span class="br0">&#40;</span>bom<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">unicode</span><span class="br0">&#40;</span>bytes<span class="br0">&#91;</span><span class="kw2">len</span><span class="br0">&#40;</span>bom<span class="br0">&#41;</span>:<span class="br0">&#93;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;encoding,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;errors=errors<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="co1"># No BOM found, so use chardet</span><br />
&nbsp; &nbsp; detection = chardet.<span class="me1">detect</span><span class="br0">&#40;</span>bytes<span class="br0">&#41;</span><br />
&nbsp; &nbsp; encoding = detection.<span class="me1">get</span><span class="br0">&#40;</span><span class="st0">'encoding'</span><span class="br0">&#41;</span> <span class="kw1">or</span> <span class="st0">'utf-16&#8242;</span><br />
&nbsp; &nbsp; <span class="kw1">return</span> <span class="kw2">unicode</span><span class="br0">&#40;</span>bytes, encoding, errors=errors<span class="br0">&#41;</span></div>
<p>Usage:</p>
<div class="dean_ch" style="white-space: wrap;">
text = bytes2unicode<span class="br0">&#40;</span><span class="kw2">open</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span>, <span class="st0">'replace'</span><span class="br0">&#41;</span></div>
<h3>Discussion: Why check for a BOM?</h3>
<p>You might ask, why check for a BOM if chardet already does this? This is because although chardet will correctly detect the BOM, it won't tell you that it found it, so you won't know to chop it off before processing the text. Which means that you'd have to check for a BOM anyway in most cases.</p>
]]></content:encoded>
			<wfw:commentRss>http://ginstrom.com/scribbles/2008/03/08/using-chardet-to-convert-arbitrary-byte-strings-to-unicode/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Python GUI programming platforms for Windows</title>
		<link>http://ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/</link>
		<comments>http://ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/#comments</comments>
		<pubDate>Tue, 26 Feb 2008 06:00:57 +0000</pubDate>
		<dc:creator>Ryan Ginstrom</dc:creator>
		
		<category><![CDATA[programming]]></category>

		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.ginstrom.com/scribbles/2008/02/26/python-gui-programming-platforms-for-windows/</guid>
		<description><![CDATA[[Edit]
By popular demand, I've added a section on PyGTK. See bottom of post.
There are several platforms for programming Windows GUI applications in Python. Below I outline a few of them, with a simple "hello world" example for each. Where I've lifted the example from another site, there's a link to the source.
Tkinter
Tkinter is the ubiquitous [...]]]></description>
			<content:encoded><![CDATA[<p><b>[Edit]</b><br />
By popular demand, I've added a section on PyGTK. See bottom of post.</p>
<p>There are several platforms for programming Windows GUI applications in Python. Below I outline a few of them, with a simple "hello world" example for each. Where I've lifted the example from another site, there's a link to the source.</p>
<h2>Tkinter</h2>
<p>Tkinter is the ubiquitous GUI toolkit for Python. It's cross platform and easy to use, but it looks non-native on just about every platform. There are various add-ons and improvements you can find to improve the look and feel, but the basic problem is that the toolkit implements its own widgets, rather than using the native ones provided on the platform.</p>
<h3>Pros</h3>
<ul>
<li>Most portable GUI toolkit for Python</li>
<li>Very easy to use, with pythonic API</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Non-native look and feel out of the box</li>
</ul>
<p>Hello world example <a href="http://www.shido.info/py/tkinter1.html" title="source of code snippet">(code source)</a>:<br />
<img src="/img/hello-tkinter.png" border="0"/></p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">import</span> <span class="kw3">Tkinter</span> as Tk<br />
la = Tk.<span class="me1">Label</span><span class="br0">&#40;</span><span class="kw2">None</span>, text=<span class="st0">'Hello World!'</span>, font=<span class="br0">&#40;</span><span class="st0">'Times'</span>, <span class="st0">'18&#8242;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
la.<span class="me1">pack</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
la.<span class="me1">mainloop</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>wxPython</h2>
<p><a href="http://www.wxpython.org/">wxPython</a> is probably the most popular GUI toolkit for Python. It's a wrapper for the <a href="http://www.wxwidgets.org/">wxWidgets</a> C++ toolkit, and as such it betrays a few unpythonic edges (like lumpy case, getters and setters, and funky C++ errors creeping up occasionally). There are a few pythonification efforts on top of wxPython, such as <a href="http://dabodev.com/">dabo</a> and (the now apparently moribund) <a href="http://sourceforge.net/projects/waxgui">wax</a>.</p>
<h3>Pros</h3>
<ul>
<li>Highly cross platform</li>
<li>Relatively mature and robust</li>
<li>Uses native Windows widgets for authentic look and feel</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Must include large wx runtime when packaging with py2exe (adds ~7 MB)</li>
<li>Cross platform nature makes accessing some native platform features (like ActiveX) difficult to impossible</li>
</ul>
<p>Hello world example <a href="http://www.goldb.org/goldblog/PermaLink,guid,d109ef8a-c3ea-4a2b-8ab7-9081c4dcc912.aspx" title="snippet source">(code source)</a>:<br />
<img src="/img/hello-wxpython.png" border=0 /></p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">import</span> wx</p>
<p><span class="kw1">class</span> Application<span class="br0">&#40;</span>wx.<span class="me1">Frame</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, parent<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; wx.<span class="me1">Frame</span>.<span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, parent, <span class="nu0">-1</span>, <span class="st0">'My GUI'</span>, size=<span class="br0">&#40;</span><span class="nu0">300</span>, <span class="nu0">200</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; panel = wx.<span class="me1">Panel</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; sizer = wx.<span class="me1">BoxSizer</span><span class="br0">&#40;</span>wx.<span class="me1">VERTICAL</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; panel.<span class="me1">SetSizer</span><span class="br0">&#40;</span>sizer<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; txt = wx.<span class="me1">StaticText</span><span class="br0">&#40;</span>panel, <span class="nu0">-1</span>, <span class="st0">'Hello World!'</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; sizer.<span class="me1">Add</span><span class="br0">&#40;</span>txt, <span class="nu0">0</span>, wx.<span class="me1">TOP</span>|wx.<span class="me1">LEFT</span>, <span class="nu0">20</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Centre</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Show</span><span class="br0">&#40;</span><span class="kw2">True</span><span class="br0">&#41;</span></p>
<p>app = wx.<span class="me1">App</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="br0">&#41;</span><br />
Application<span class="br0">&#40;</span><span class="kw2">None</span><span class="br0">&#41;</span><br />
app.<span class="me1">MainLoop</span><span class="br0">&#40;</span><span class="br0">&#41;</span></div>
<h2>.NET with IronPython</h2>
<p><a href="http://www.codeplex.com/IronPython">IronPython</a> is a .NET implementation of Python. As of 1.0 it has full support for Python 2.4 features, and the 2.0 version will duplicate the Python 2.5 feature set. Although there are many CPython libraries/modules that won't run under IronPython (namely, the ones relying on compiled extensions that have not yet been ported), this lack is partially made up by the huge .NET library.</p>
<p>One cool thing about IronPython is that you can easily create lightweight .exe files that you can ship off to your friends &#8212; although you pay for this with a dependency on the .NET runtime, which you can't count on random Windows users to have installed.</p>
<p>Of course, when you go the IronPython route, you take all that comes with it: the good things, like access to .NET libraries and possibly the easiest/cleanest optimization path of any Python implementation (C#); and the bad things, like dependence on the .NET runtime and danger of getting caught on the MS upgrade treadmill.</p>
<p>Another way of getting at the .NET libraries is <a href="http://pythonnet.sourceforge.net/">Python.NET</a>, which adds two files to your Python directory to enable you to call the CLR from CPython.</p>
<h3>Pros</h3>
<ul>
<li>Leverage .NET libraries</li>
<li>Easily create .exe files</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Depends on .NET runtime</li>
</ul>
<p>Hello world example <a href="http://www.voidspace.org.uk/ironpython/winforms/part2.shtml" title="snippet source">(code source)</a>:<br />
<img src="/img/hello-ipy.png" border=0 /></p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">import</span> <span class="kw3">sys</span><br />
<span class="kw3">sys</span>.<span class="me1">path</span>.<span class="me1">append</span><span class="br0">&#40;</span>r<span class="st0">'C:<span class="es0">\P</span>ython24<span class="es0">\L</span>ib'</span><span class="br0">&#41;</span></p>
<p><span class="kw1">import</span> clr<br />
clr.<span class="me1">AddReference</span><span class="br0">&#40;</span><span class="st0">&quot;System.Windows.Forms&quot;</span><span class="br0">&#41;</span></p>
<p><span class="kw1">from</span> System.<span class="me1">Windows</span>.<span class="me1">Forms</span> <span class="kw1">import</span> Application, Form</p>
<p><span class="kw1">class</span> HelloWorldForm<span class="br0">&#40;</span>Form<span class="br0">&#41;</span>:</p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Text</span> = <span class="st0">'Hello World'</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">Name</span> = <span class="st0">'Hello World'</span></p>
<p>form = HelloWorldForm<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
Application.<span class="me1">Run</span><span class="br0">&#40;</span>form<span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>PyQT</h2>
<p><a href="http://www.riverbankcomputing.co.uk/pyqt/">PyQT</a> is probably the third most widely used GUI toolkit, after wxPython and Tkinter. It has a dual commercial/GPL license (<ins datetime="2008-02-27T22:23:05+00:00">Edit: but it does let you use other open-source licenses; see comments below</ins>). I have to admit that this made it a non-starter for me: I don't want to pay for my toolkit when there are others just as good or better that are free; <del datetime="2008-02-27T22:23:05+00:00">and when I do release open-source software, I want to choose my own license</del>. For others, the GPL might be a non-issue or a plus, so I've left it off my pro/con list.</p>
<h3>Pros</h3>
<ul>
<li>Highly cross platform</li>
<li>Very easy to use</li>
<li>Highly mature</li>
<li>Decent looking widgets</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Somewhat non-native look and feel (though much better than Tkinter)</li>
<li>Must include large runtime when packaging with py2exe</li>
</ul>
<p>Hello world example (from PyQT docs):</p>
<div><img src="/img/hello-qt.png" alt="PyQT screen shot" /></div>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">import</span> <span class="kw3">sys</span><br />
<span class="kw1">from</span> PyQt4 <span class="kw1">import</span> QtGui</p>
<p>app = QtGui.<span class="me1">QApplication</span><span class="br0">&#40;</span><span class="kw3">sys</span>.<span class="me1">argv</span><span class="br0">&#41;</span></p>
<p>hello = QtGui.<span class="me1">QPushButton</span><span class="br0">&#40;</span><span class="st0">&quot;Hello world!&quot;</span><span class="br0">&#41;</span><br />
hello.<span class="me1">resize</span><span class="br0">&#40;</span><span class="nu0">100</span>, <span class="nu0">30</span><span class="br0">&#41;</span></p>
<p>hello.<span class="me1">show</span><span class="br0">&#40;</span><span class="br0">&#41;</span></p>
<p><span class="kw3">sys</span>.<span class="me1">exit</span><span class="br0">&#40;</span>app.<span class="me1">exec_</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></div>
<h2>Pyglet</h2>
<p><a href="http://www.pyglet.org/">Pyglet</a> is kind of the new kid on the block in terms of GUI toolkits, but it sure made a splash. It implements its own windowing system, but with no dependencies other than Python (for Python 2.5 users). You will need <a href="http://www.opengl.org/">OpenGL</a> to do decent 3D graphics, but that's hardly a black mark for pyglet &#8212; other libraries would love to make it this easy.</p>
<h3>Pros</h3>
<ul>
<li>High degree of freedom for GUI creation</li>
<li>Only depends on Python</li>
<li>Large number of widgets</li>
</ul>
<h3>Cons</h3>
<ul>
<li>Purposely doesn't duplicate the native platform look and feel</li>
<li>Although there are a lot of widgets, you'll have to roll your own for many things the platform gives you for free.</li>
</ul>
<p>Hello world example (slightly modified from <a href="http://www.pyglet.org/doc/programming_guide/hello_world.html">code source</a>):<br />
<img src="/img/hello-pyglet.png" alt="hello world with pyglet screenshot" border=0 /></p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">from</span> pyglet <span class="kw1">import</span> font<br />
<span class="kw1">from</span> pyglet <span class="kw1">import</span> window</p>
<p>win = window.<span class="me1">Window</span><span class="br0">&#40;</span>width=<span class="nu0">300</span>, height=<span class="nu0">150</span>, caption=<span class="st0">&quot;Hello World&quot;</span><span class="br0">&#41;</span></p>
<p>ft = font.<span class="me1">load</span><span class="br0">&#40;</span><span class="st0">'Arial'</span>, <span class="nu0">36</span><span class="br0">&#41;</span><br />
text = font.<span class="me1">Text</span><span class="br0">&#40;</span>ft, <span class="st0">'Hello, World!'</span><span class="br0">&#41;</span></p>
<p><span class="kw1">while</span> <span class="kw1">not</span> win.<span class="me1">has_exit</span>:<br />
&nbsp; &nbsp; win.<span class="me1">dispatch_events</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; win.<span class="me1">clear</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; text.<span class="me1">draw</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; win.<span class="me1">flip</span><span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp;</div>
<h2>Win32 with ctypes</h2>
<p>Of course, all you really need to write GUI applications on Windows with Python is your trusty ctypes module and a well worn copy of <a href="http://www.charlespetzold.com/pw5/">Petzold</a>. The benefit of this style is that you're working right down at the system API level, with nothing to get in your way. The disadvantage is that you're working right down at the system API level, with nothing to relieve you from all that boilerplate (unless you write your own abstraction layer on top; see Venster, below&#8230;).</p>
<h3>Pros</h3>
<ul>
<li>Enables high level of control</li>
<li>Straightforward if familiar with Win32 API</li>
<li>No added complexity or buried functionality due to need to be cross-platform</li>
<li>Lightest of all Windows GUI programming methods using Python</li>
</ul>
<h3>Cons</h3>
<ul>
<li>All the complexity and inconsistency of Win32 API in gory detail</li>
<li>Lack of high-level libraries (have to write more code)</li>
</ul>
<p>Hello world example (long, ain't it?):<br />
<img src="/img/hello-win32.png" alt="Win32 GUI screen shot" /></p>
<div class="dean_ch" style="white-space: wrap;">
<span class="kw1">from</span> ctypes <span class="kw1">import</span> *<br />
<span class="kw1">import</span> win32con</p>
<p>WNDPROC = WINFUNCTYPE<span class="br0">&#40;</span>c_long, c_int, c_uint, c_int, c_int<span class="br0">&#41;</span></p>
<p>NULL = c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span><br />
_user32 = windll.<span class="me1">user32</span></p>
<p><span class="kw1">def</span> ErrorIfZero<span class="br0">&#40;</span>handle<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="kw1">if</span> handle == <span class="nu0">0</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> WinError<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; <span class="kw1">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">return</span> handle</p>
<p>CreateWindowEx = _user32.<span class="me1">CreateWindowExW</span><br />
CreateWindowEx.<span class="me1">argtypes</span> = <span class="br0">&#91;</span>c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_wchar_p,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_wchar_p,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#93;</span><br />
CreateWindowEx.<span class="me1">restype</span> = ErrorIfZero</p>
<p>
<span class="kw1">class</span> WNDCLASS<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'style'</span>, c_uint<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpfnWndProc'</span>, WNDPROC<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'cbClsExtra'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'cbWndExtra'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hInstance'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hIcon'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hCursor'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'hbrBackground'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpszMenuName'</span>, c_wchar_p<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lpszClassName'</span>, c_wchar_p<span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;wndProc,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;style=win32con.<span class="me1">CS_HREDRAW</span> | win32con.<span class="me1">CS_VREDRAW</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;clsExtra=<span class="nu0">0</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;wndExtra=<span class="nu0">0</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;menuName=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;className=u<span class="st0">&quot;PythonWin32&quot;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;instance=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;icon=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;cursor=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;background=<span class="kw2">None</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="br0">&#41;</span>:</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> instance:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; instance = windll.<span class="me1">kernel32</span>.<span class="me1">GetModuleHandleW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> icon:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; icon = _user32.<span class="me1">LoadIconW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#40;</span>win32con.<span class="me1">IDI_APPLICATION</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> cursor:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cursor = _user32.<span class="me1">LoadCursorW</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">NULL</span><span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;c_int<span class="br0">&#40;</span>win32con.<span class="me1">IDC_ARROW</span><span class="br0">&#41;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">if</span> <span class="kw1">not</span> background:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; background = windll.<span class="me1">gdi32</span>.<span class="me1">GetStockObject</span><span class="br0">&#40;</span>c_int<span class="br0">&#40;</span>win32con.<span class="me1">WHITE_BRUSH</span><span class="br0">&#41;</span><span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpfnWndProc</span>=wndProc<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">style</span>=style<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">cbClsExtra</span>=clsExtra<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">cbWndExtra</span>=wndExtra<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hInstance</span>=instance<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hIcon</span>=icon<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hCursor</span>=cursor<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hbrBackground</span>=background<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpszMenuName</span>=menuName<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">lpszClassName</span>=className</p>
<p><span class="kw1">class</span> RECT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'left'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'top'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'right'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'bottom'</span>, c_long<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, left=<span class="nu0">0</span>, top=<span class="nu0">0</span>, right=<span class="nu0">0</span>, bottom=<span class="nu0">0</span> <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">left</span> = left<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">top</span> = top<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">right</span> = right<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">bottom</span> = bottom</p>
<p><span class="kw1">class</span> PAINTSTRUCT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'hdc'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fErase'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'rcPaint'</span>, RECT<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fRestore'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'fIncUpdate'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'rgbReserved'</span>, c_wchar * <span class="nu0">32</span><span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">class</span> POINT<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'x'</span>, c_long<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'y'</span>, c_long<span class="br0">&#41;</span><span class="br0">&#93;</span><br />
&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span> <span class="kw2">self</span>, x=<span class="nu0">0</span>, y=<span class="nu0">0</span> <span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">x</span> = x<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">y</span> = y</p>
<p><span class="kw1">class</span> MSG<span class="br0">&#40;</span>Structure<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; _fields_ = <span class="br0">&#91;</span><span class="br0">&#40;</span><span class="st0">'hwnd'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'message'</span>, c_uint<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'wParam'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'lParam'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'time'</span>, c_int<span class="br0">&#41;</span>,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="br0">&#40;</span><span class="st0">'pt'</span>, POINT<span class="br0">&#41;</span><span class="br0">&#93;</span></p>
<p><span class="kw1">def</span> pump_messages<span class="br0">&#40;</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Calls message loop&quot;</span><span class="st0">&quot;&quot;</span><br />
&nbsp; &nbsp; msg = MSG<span class="br0">&#40;</span><span class="br0">&#41;</span><br />
&nbsp; &nbsp; pMsg = pointer<span class="br0">&#40;</span>msg<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">while</span> _user32.<span class="me1">GetMessageW</span><span class="br0">&#40;</span>pMsg, NULL, <span class="nu0">0</span>, <span class="nu0">0</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">TranslateMessage</span><span class="br0">&#40;</span>pMsg<span class="br0">&#41;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; _user32.<span class="me1">DispatchMessageW</span><span class="br0">&#40;</span>pMsg<span class="br0">&#41;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">return</span> msg.<span class="me1">wParam</span></p>
<p>
<span class="kw1">class</span> Window<span class="br0">&#40;</span><span class="kw2">object</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; <span class="st0">&quot;&quot;</span><span class="st0">&quot;Wraps an HWND handle&quot;</span><span class="st0">&quot;&quot;</span></p>
<p>&nbsp; &nbsp; <span class="kw1">def</span> <span class="kw4">__init__</span><span class="br0">&#40;</span><span class="kw2">self</span>, hwnd=NULL<span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>.<span class="me1">hwnd</span> = hwnd</p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw2">self</span>._event_handlers = <span class="br0">&#123;</span><span class="br0">&#125;</span></p>
<p>&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1"># Register event handlers</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">for</span> key <span class="kw1">in</span> <span class="kw2">dir</span><span class="br0">&#40;</span><span class="kw2">self</span><span class="br0">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; method = <span class="kw2">getattr</span><span class="br0">&#40;</span><span class="kw2">self</span>, key<span class="br0">&#41;</span><br />
&nbsp; &nbsp;