THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Peter DeBetta's SQL Programming Blog

Peter DeBetta's blog about programming in SQL Server using technologies such as T-SQL, .NET, CLR, C#, VB, Visual Studio, and SQL Server Management Studio.

iPod and iTunes - Parsing the XML Library

So I was trying to figure out a way to get my iTunes/iPod music library into SQL Server. Unfortunately, the XML in the library is not the easiest with which to work. Here is an abridged version of that XML:

<plist version="1.0">
   <
dict
>
      <
key>Major Version</key
>
      <
integer>1</integer
>
      <key>Application Version</key
>
      <
string>7.0.2</string
>
      <key>Show Content Ratings</key
>
      <
true
/>
      <key>Tracks</key
>
      <dict
>
         <
key>1288</key
>
         <
dict>
            <
key>Track ID</key
>
            <
integer>1288</integer
>
            <
key>Name</key
>
            <
string>Brighter Than Sunshine</string
>
            <
key>Artist</key
>
            <
string>Aqualung</string
>
            <
key>Album Artist</key
>
            <
string>Aqualung</string
>
            <
key>Album</key
>
         </dict
>
      </dict
>
   </dict
>
</plist>

It uses a name-value pairing of the nodes, which would not have been my first choice is the schema. I would have gone with something more like this:

<plist version="1.0">
   <
key name="Major Version" value="1" type="integer"
/>
   <
key name="Application Version" value="7.0.2" type="string"
/>
   <
key name="Show Content Ratings" value="true" type="boolean"
/>
   <
tracks
>
      <
track id="1288"
>
         <
key name="TrackID" value="1288" type="integer"
/>
         <
key name="Name" value="Brighter Than Sunshine" type="string"
/>
         <
key name="Artist" value="Aqualung" type="string"
/>
         <
key name="Album Artist" value="Aqualung" type="string"
/>
         <
key name="Album" value="Strange &amp; Beautiful" type="string"
/>
      </
track
>
   </
tracks
>
</
plist>

I digress. The point here is how do I get the data from the exported library file (File... Export Library... in iTunes). It was an interesting task and the solution used the XML, CTE, and PIVOT features of SQL Server 2005

DECLARE @x xml
CREATE
TABLE #t (LibraryID int identity(1, 1), Library xml)

-- Load the XML, you will need to do the convert to drop the DTD refs
-- and load without errors

SET @X = (SELECT CONVERT(xml, BulkColumn, 2) FROM
OPENROWSET
   (BULK N'C:\Library.xml',
 
   SINGLE_BLOB) AS Document)

-- Use the nodes method to extract the tracks
INSERT
INTO #t (library
)
SELECT T.C.query('.') AS
SongInfo
FROM @x.nodes('/plist[1]/dict[1]/dict[1]/dict') AS T(C
)

-- Generate the name-value pairs
;
WITH kvps AS
   (SELECT library.value('/dict[1]/integer[1]', 'int') AS [ID]
,
      t.c.value('local-name(.)', 'varchar(255)') AS [type]
,
      t.c.value('.', 'varchar(255)') AS [value]
,
      t.c.value('for $s in . return count(../*[. << $s]) + 1', 'int') 
AS
RowNum
      -- this last one generates a row number for the returned nodes 
   FROM
#t
      CROSS APPLY library.nodes('/dict/*') AS T(C
))
, final AS
 
   (SELECT k.id
,
      k2.RowNum / 2 AS [RowNum],
 
      k.value AS [Key],
 
      CASE WHEN k2.[value] IS NOT NULL AND k2.[value] != ''
 
         THEN k2.[value] ELSE k2.type END AS [Value]
,
      CASE WHEN k2.[value] IS NOT NULL AND k2.[value] != ''
 
         THEN k2.[type] ELSE 'bit' END AS
[Type]
   FROM kvps AS
k
      INNER JOIN kvps AS
k2 
         ON k.rownum + 1 = k2.rownum AND k.id = k2.id
   WHERE k2.rownum >AND k2.rownum % 2 = 0
)
SELECT
*
INTO
#t2
FROM final

-- Pivot the results
;
WITH t2 AS (SELECT [id], [key], [value] FROM #t2)
SELECT [id], [Name], [Artist], [Album Artist], [Album], [Genre],
 
   [Kind]
, [Size], [Total Time],
 
   IsNull([Disc Number], 1) AS [Disc Number], 
   IsNull([Disc Count], 1) AS [Disc Count],
 
   IsNull([Track Number], 1) AS [Track Number], 
   [Track Count],
 [Year], [Date Modified],
 
   [Date Added], [Bit Rate],
[Sample Rate]
INTO
#t3
FROM
t2
PIVOT
 
   (MAX([Value]
)
   FOR [key] IN
 
      ([Name], [Artist], [Album Artist], [Album], [Genre], [Kind], 
      [Size], [Total Time],
 [Disc Number], [Disc Count], 
      [Track Number], [Track Count], [Year], [Date Modified],
 
      [Date Added], [Bit Rate], [Sample Rate]
)
   ) AS
pvt

-- Viola - you have a list of tracks
SELECT
* FROM #t3
ORDER BY [Artist], [Album], [Disc Number], [Track Number]

--DROP TABLE #t
--DROP TABLE #t2
--DROP TABLE #t3

If you want to make the data more permanent, don't use temporary tables.

Now this will take some time to run, depending on how many songs you actually have. My collection of 13,851 songs took 5 1/2 minutes to run this code and generate my results. Mind you, the XML file is 20 MB and the joins are awful. However, some indexes would make this go faster.

The cool thing is that you could take this concept to the next level using other technologies (such as SSIS) in order to keep your SQL database up to date.

FYI: I included the SQL file that contains all the code in this post.

Enjoy!

--Peter

Published Tuesday, December 05, 2006 1:55 PM by Peter W. DeBetta
Filed under: , ,

Attachment(s): itunes xml for sqlblog.sql

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Matt Greenslade said:

Very cool. I had been looking at doing the same thing in order to learn the capabilities o SQL 2005 with XML (we're upgrading next motnh from 2000). I was trying to figure out if theres a way to do the reversse - i.e. amend the data once its in SQL Server and then somehow impor it back to iTunes. Searches revealed very little on this.
December 7, 2006 10:27 AM
 

Lisa Slater Nicholls said:

I'll go on-topic to your digression: I would have agreed with you about the schema *except* that (having used both sorts of name-value pairings, and having tried lots of variants) your preferred version assumes simple types as values.  

Apple's dict structure allows a key to contain more complex information more easily.  For example,  full specifications for album layout, an image blob, internal values such as royalty breakdown.

If I am doing it and have no constraints, my preference looks something like

<key name="...">
<!-- additional optional subs, such as a hash or signature specific to the key, can be either attribs if simple or more child nodes if complex -->
<value>
...
</value>
</key>

... but in any case it doesn't really matter.  Mapping what you need for the import into SQL using XSLT should be a trivial task, no matter what format they start with.  20 megs shouldn't be a problem and it shouldn't take 5 minutes to run.  (The other way around too, Matt G!).  

Everybody should use whatever private XML schema they like best, it's NBD.  That's exactly why XML is so great to work with.  Everybody represents their universe of data just as it appears to them, just like people express their thoughts in their own words in a productive conversation <s>.
December 11, 2006 9:08 AM
 

Peter W. DeBetta said:

I suppose my digression point was to focus more on why Apple choose to use DTD (which is actually not standard XML - it's carry-over from SGML) instead of XML Schema. Alas, I didn't make that point very clear.

FYI - you can view Apple's DTD definition here:
http://www.apple.com/DTDs/PropertyList-1.0.dtd

That being said, I agree with you about the schema. My example was definitely limiting and the version you propose is definitely better.

--Peter
December 12, 2006 8:44 AM
 

Lisa Slater Nicholls said:

OIC.  Well, Apple might haveused DTDs because there are (still) many processors and parsers out there that are better at using DTDs than XSDs.  Alternatively, they may have done it could be because there are one or two concepts that are just easier to express in DTDs. Not that I can name one off the top of my head, since it's a long time since I worried about this -- I *like* XSDs <g>.

IAC, Peter, you're talking as if XML Schema is the latest and greatest and everybody should understand that. Tons of heavy-weight people would say you're just behind the times, you or Apple should be using RELAX NG instead <g>.  There's no real consensus -- except maybe in MS-centric tools.  

IOW: just as I believe it shouldn't matter all that much what a private schema is, I'm not terribly fussed about how its validating description is expressed.  It's just another mapping problem, except for the fact that DTDs aren't XML, which causes some annoyance occasionally (RELAX has both an XML and non-XML syntax).

My real point in writing was: never mind whether Apple's version, your simplified version, or mine, I assume you have an optimal target XML syntax in mind that SQL Server will use for bulk uploading purposes?

20 megs isn't that big a file... does the bulk loader have the ability to apply XSLT on the fly, in which case point me to it? Otherwise I'd be interested to test a preprocessing transform + bulk load against the procedure you chose.  

TIA. FWIW, this is not my area of expertise.  SQL Server XML syntax looks really contorted to me, frankly.  I'm trying to decide why I would use it instead of what looks like a trivial mapping exercise.
December 14, 2006 1:54 PM
 

Kevin said:

I'm not an XML-er, so this was a great find, no matter how long it takes to run.

Many thanks :)

January 19, 2007 4:56 AM
 

Frederik said:

Nice Script Peter!!

I struggled with this one while preparing my presentation for TechEd Barcelona. In the end I used XSLT and Integration Services to import the iTunes library into SQL Server.  I also created a cube in Analysis Services and built some reports in Reporting Services.

The presentation is called "biTunes - Building an end-to-end Business Intelligence (BI) application with SQL Server 2005 and … iTunes!".    I posted the source code on my blog.

http://www.vandeputte.org/2006/11/bitunes-source-code.html

Frederik

March 23, 2007 1:07 PM
 

vplakhi said:

I was very impressed with your script -- until I realized that you can copy paste all visible columns from the iTunes library or any playlist. I ended up copy/pasting to Excel, and then using ODBC to auto export to Oracle. So for anyone looking for a one-off capture for personal analysis (rather than a programmatic solution for software you're working on), this might be a quicker approach.

August 9, 2007 9:41 PM

Leave a Comment

(required) 
(required) 
Submit

About Peter W. DeBetta

Peter DeBetta works for Microsoft as an Application Platform Technology Strategist, specializing in strategies, design, implementation, and deployment of Microsoft SQL Server and .NET solutions. Peter writes courseware, articles, and books – most recently the title Introducing SQL Server 2008 from Microsoft Press. Peter speaks at conferences around the world, including TechEd, SQL PASS Community Summit, DevTeach, SQL Connections, DevWeek, and VSLive!

When Peter isn’t working, you can find him singing and playing guitar (click here to hear an original song by Peter), taking pictures, or simply enjoying life with his wife, son, and daughter.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement