Web Scraping TED - information about a specific talk
In my ideas for TED I mentioned that currently they don't give access to their API to new users. This should not necessarily stop us from getting the data from the web site. We can use the good old web scraping.
I've picked one of the videos almost randomly: The year open data went worldwide. If you look at the page you'll see that it has "32 subtitle languages" (or maybe more by the time you look at it). If you click on that text you'll see a modal display showing the list of the languages. Clearly it is some JavaScript code that generates this modal display.
I looked at the source of the page (just by right-clicking on the page in the browser) trying to locate the data. At first I searched for ""subtitle languages", but that did not lead me to the list of languages.
Then I searched for 'Chinese', the name of one of the translations I suspected won't show up in any other part of the page, and I found it embedded in a json structure inside a JavaScript function embedded in a <script> tag.
Equipted with information I started to write a small script that would fetch the page, find all the 'script' tags and print the content of these script tags. At first I used Web::Query to fetch the page, find the 'script' tags and extract their content. The first two steps when well, but the function I was expecting to extract the 'text' inside the 'script' tags did not return anything. So I filed a bug-report/question.
I did not want to wait for a reply so to have a faster solution I turned to Regular Expression. Normally parsing HTML using Regexes is considered a sin, but in this case we had to extract a single tag that did not have any other tags in it and is very unlikely to contain a string that looks like a closing 'script' tag.
examples/scraping_ted/parse_with_regex.pl
#!/usr/bin/perl use strict; use warnings; use 5.010; use LWP::Simple qw(get); my $url = 'http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide'; my $html = get $url; foreach my $script ($html =~ m{<script>(.*?)</script>}gs) { say $script; say '------'; }
I used the get function of LWP::Simple to fetch the page and then a regex to parse it and extract the content of every 'script' tag. In this regex I've use .*? a minimal match, and the s modifier to change the behavior of . to match any characters. Including newlines. The g modifier is only there to fetch globally, all the possible matches.
Extract JSON
The next step was to extract the JSON from within the 'script' tag. For this I had to use a Regex again as the JSON is embedded in a JavaScript function called 'q'. It looked something like this:
q("talkPage.init",({"talks" ... }))
except that there was a lot more code instead the 3 dots.
For this I used the following expression:
my ($json) = $script =~ /^q\("talkPage\.init",(\{"talks".*)\)/s;
The left-hand side of the assignment must be in parentheses to create LIST context and we again use s modifier again to change the behavior of ..
There are many 'script' tags on this page, but only one is expected to match this regex and that is expected to return a JSON string. So we can skip the rest of the look if $json is empty.
If $json was not empty then we would like to convert it to a Perl data structure. For that we can use the decode_json function of any of the JSON modules. Resulting in this code:
examples/scraping_ted/extract_json.pl
#!/usr/bin/perl use strict; use warnings; use 5.010; use LWP::Simple qw(get); use JSON qw(decode_json); use Data::Dumper qw(Dumper); my $url = 'http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide'; my $html = get $url; foreach my $script ($html =~ m{<script>(.*?)</script>}gs) { my ($json) = $script =~ /^q\("talkPage\.init",(\{"talks".*)\)/s; next if not $json; my $data = decode_json $json; print Dumper $data; say '------'; }
Unfortunately running this script will throw an exception:
Wide character in subroutine entry at extract_json.pl line 18
I've already seen this problem once. I had to mark the JSON string to be real UTF-8 string using the encode function of the Encode module.
examples/scraping_ted/get_and_extract_json.pl
#!/usr/bin/perl use strict; use warnings; use 5.010; use LWP::Simple qw(get); use JSON qw(decode_json); use Data::Dumper qw(Dumper); use Encode qw(encode); my $url = 'http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide'; my $html = get $url; foreach my $script ($html =~ m{<script>(.*?)</script>}gs) { my ($json) = $script =~ /^q\("talkPage\.init",(\{"talks".*)\)/s; next if not $json; my $data = decode_json encode('utf8', $json); print Dumper $data; }
The output of this script looks like this:
examples/scraping_ted/json_dump.pl
$VAR1 = { 'threadId' => 649, 'relatedTalks' => [ { 'title' => 'The next web', 'slug' => 'tim_berners_lee_on_the_next_web', 'duration' => 983, 'speaker' => 'Tim Berners-Lee', 'image' => 'https://tedcdnpi-a.akamaihd.net/r/tedcdnpe-a.akamaihd.net/images/ted/77260_800x600.jpg?quality=75&w=500' }, { 'slug' => 'melati_and_isabel_wijsen_our_campaign_to_ban_plastic_bags_in_bali', 'title' => 'Our campaign to ban plastic bags in Bali', 'image' => 'https://tedcdnpi-a.akamaihd.net/r/tedcdnpe-a.akamaihd.net/images/ted/0da6ace6197fc74eaf425c413eb5636d57e9891e_2880x1620.jpg?quality=75&w=500', 'duration' => 660, 'speaker' => 'Melati and Isabel Wijsen' }, { 'slug' => 'auke_ijspeert_a_robot_that_runs_and_swims_like_a_salamander', 'title' => 'A robot that runs and swims like a salamander', 'image' => 'https://tedcdnpi-a.akamaihd.net/r/tedcdnpe-a.akamaihd.net/images/ted/0b0fb52e085bad1e7834a6bfcc93f27cba088559_2880x1620.jpg?quality=75&w=500', 'duration' => 850, 'speaker' => 'Auke Ijspeert' }, { 'image' => 'https://tedcdnpi-a.akamaihd.net/r/tedcdnpe-a.akamaihd.net/images/ted/017abc101c829da234618637fdfbfd09eb296fba_2880x1620.jpg?quality=75&w=500', 'duration' => 1053, 'speaker' => 'Elizabeth Lev', 'slug' => 'elizabeth_lev_the_unheard_story_of_the_sistine_chapel', 'title' => 'The unheard story of the Sistine Chapel' }, { 'duration' => 656, 'speaker' => 'Oscar Schwartz', 'image' => 'https://tedcdnpi-a.akamaihd.net/r/tedcdnpe-a.akamaihd.net/images/ted/952396f5b0b7aa178f6198669818c9b1cf324312_2880x1620.jpg?quality=75&w=500', 'title' => 'Can a computer write poetry?', 'slug' => 'oscar_schwartz_can_a_computer_write_poetry' }, { 'slug' => 'wael_ghonim_let_s_design_social_media_that_drives_real_change', 'title' => 'Let\'s design social media that drives real change', 'image' => 'https://tedcdnpi-a.akamaihd.net/r/tedcdnpe-a.akamaihd.net/images/ted/0da87f5c54fb8855274c9553595d550999e71288_2880x1620.jpg?quality=75&w=500', 'duration' => 814, 'speaker' => 'Wael Ghonim' }, { 'slug' => 'aomawa_shields_how_we_ll_find_life_on_other_planets', 'title' => 'How we\'ll find life on other planets', 'image' => 'https://tedcdnpi-a.akamaihd.net/r/tedcdnpe-a.akamaihd.net/images/ted/b71083deab779a49aa52070dabd282cf96296b38_2880x1620.jpg?quality=75&w=500', 'speaker' => 'Aomawa Shields', 'duration' => 325 }, { 'title' => 'Governments don\'t understand cyber warfare. We need hackers', 'slug' => 'rodrigo_bijou_governments_don_t_understand_cyber_warfare_we_need_hackers', 'duration' => 568, 'speaker' => 'Rodrigo Bijou', 'image' => 'https://tedcdnpi-a.akamaihd.net/r/tedcdnpe-a.akamaihd.net/images/ted/28a3233882ad006a57da361770cec0cbaeab5170_2880x1620.jpg?quality=75&w=500' }, { 'title' => 'The future of news? Virtual reality', 'slug' => 'nonny_de_la_pena_the_future_of_news_virtual_reality', 'duration' => 567, 'speaker' => "Nonny de la Pe\x{f1}a", 'image' => 'https://tedcdnpi-a.akamaihd.net/r/tedcdnpe-a.akamaihd.net/images/ted/ddaf3e1ce01e2c3ee875970d3c7bf8bb4e9e92c3_2880x1620.jpg?quality=75&w=500' }, { 'image' => 'https://tedcdnpi-a.akamaihd.net/r/tedcdnpe-a.akamaihd.net/images/ted/90eeddc216ca86ad2fbf99d0823a39fe681e7513_2880x1620.jpg?quality=75&w=500', 'speaker' => "Andreas Ekstr\x{f6}m", 'duration' => 558, 'slug' => 'andreas_ekstrom_the_moral_bias_behind_your_search_results', 'title' => 'The moral bias behind your search results' } ], 'precontrol' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ), 'ratings' => [ { 'id' => 10, 'name' => 'Inspiring', 'count' => 259 }, { 'count' => 205, 'name' => 'Informative', 'id' => 8 }, { 'name' => 'Fascinating', 'count' => 164, 'id' => 22 }, { 'count' => 136, 'name' => 'Persuasive', 'id' => 24 }, { 'id' => 3, 'name' => 'Courageous', 'count' => 17 }, { 'count' => 85, 'name' => 'Ingenious', 'id' => 9 }, { 'count' => 88, 'name' => 'Jaw-dropping', 'id' => 23 }, { 'count' => 31, 'name' => 'Beautiful', 'id' => 1 }, { 'id' => 2, 'count' => 12, 'name' => 'Confusing' }, { 'count' => 24, 'name' => 'OK', 'id' => 25 }, { 'id' => 26, 'count' => 11, 'name' => 'Obnoxious' }, { 'id' => 21, 'name' => 'Unconvincing', 'count' => 8 }, { 'count' => 6, 'name' => 'Longwinded', 'id' => 11 }, { 'id' => 7, 'count' => 4, 'name' => 'Funny' } ], 'language' => 'en', 'talks' => [ { 'external' => undef, 'slug' => 'tim_berners_lee_the_year_open_data_went_worldwide', 'published' => 1268040420, 'nativeLanguage' => 'en', 'thumb' => 'https://tedcdnpi-a.akamaihd.net/r/tedcdnpe-a.akamaihd.net/images/ted/154673_800x600.jpg?quality=89&w=600', 'duration' => 333, 'resources' => { 'rtmp' => [ { 'file' => 'mp4:talk/stream/2010U/Blank/TimBernersLee_2010U-1500k.mp4', 'height' => 720, 'width' => 1280, 'name' => '1500k', 'bitrate' => 1500 }, { 'bitrate' => 950, 'name' => '950k', 'width' => 854, 'height' => 480, 'file' => 'mp4:talk/stream/2010U/Blank/TimBernersLee_2010U-950k.mp4' }, { 'bitrate' => 600, 'name' => '600k', 'width' => 640, 'height' => 360, 'file' => 'mp4:talk/stream/2010U/Blank/TimBernersLee_2010U-600k.mp4' }, { 'name' => '450k', 'bitrate' => 450, 'file' => 'mp4:talk/stream/2010U/Blank/TimBernersLee_2010U-450k.mp4', 'height' => 288, 'width' => 512 }, { 'bitrate' => 320, 'name' => '320k', 'width' => 512, 'height' => 288, 'file' => 'mp4:talk/stream/2010U/Blank/TimBernersLee_2010U-320k.mp4' }, { 'name' => '180k', 'bitrate' => 180, 'file' => 'mp4:talk/stream/2010U/Blank/TimBernersLee_2010U-180k.mp4', 'height' => 288, 'width' => 512 }, { 'bitrate' => 64, 'name' => '64k', 'width' => 398, 'height' => 224, 'file' => 'mp4:talk/stream/2010U/Blank/TimBernersLee_2010U-64k.mp4' } ], 'hls' => { 'metadata' => 'https://hls.ted.com/talks/788.json', 'adUrl' => 'https://pubads.g.doubleclick.net/gampad/ads?ciu_szs=300x250%2C512x288%2C120x60%2C320x50%2C6x7%2C6x8&correlator=%5Bcorrelator%5D&cust_params=event%3DTED2010%26id%3D788%26tag%3DInternet%2CTED%2BConference%2Ccomputers%2Cstatistics%2Cvisualizations%2Cweb%26talk%3Dtim_berners_lee_the_year_open_data_went_worldwide%26year%3D2010&env=vp&gdfp_req=1&impl=s&iu=%2F5641%2Fmobile%2Fios%2Fweb&output=xml_vast2&sz=640x360&unviewed_position_start=1&url=%5Breferrer%5D', 'stream' => 'https://hls.ted.com/talks/788.m3u8' }, 'h264' => [ { 'bitrate' => 320, 'file' => 'http://download.ted.com/talks/TimBernersLee_2010U-320k.mp4?dnt' } ] }, 'title' => 'The year open data went worldwide', 'languages' => [ { 'languageCode' => 'sq', 'ianaCode' => 'sq', 'languageName' => 'Albanian', 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => 'Shqip' }, { 'languageCode' => 'ar', 'ianaCode' => 'ar', 'endonym' => "\x{627}\x{644}\x{639}\x{631}\x{628}\x{64a}\x{629}", 'isRtl' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ), 'languageName' => 'Arabic' }, { 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => "\x{431}\x{44a}\x{43b}\x{433}\x{430}\x{440}\x{441}\x{43a}\x{438}", 'languageName' => 'Bulgarian', 'ianaCode' => 'bg', 'languageCode' => 'bg' }, { 'languageCode' => 'zh-cn', 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => "\x{4e2d}\x{6587} (\x{7b80}\x{4f53})", 'languageName' => 'Chinese, Simplified', 'ianaCode' => 'zh-Hans' }, { 'languageCode' => 'zh-tw', 'ianaCode' => 'zh-Hant', 'endonym' => "\x{4e2d}\x{6587} (\x{7e41}\x{9ad4})", 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Chinese, Traditional' }, { 'languageCode' => 'hr', 'ianaCode' => 'hr', 'endonym' => 'Hrvatski', 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Croatian' }, { 'languageCode' => 'cs', 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Czech', 'endonym' => "\x{10c}e\x{161}tina", 'ianaCode' => 'cs' }, { 'languageCode' => 'nl', 'endonym' => 'Nederlands', 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Dutch', 'ianaCode' => 'nl' }, { 'languageCode' => 'en', 'ianaCode' => 'en', 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => 'English', 'languageName' => 'English' }, { 'endonym' => "Fran\x{e7}ais", 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'French', 'ianaCode' => 'fr', 'languageCode' => 'fr' }, { 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'German', 'endonym' => 'Deutsch', 'ianaCode' => 'de', 'languageCode' => 'de' }, { 'languageCode' => 'el', 'ianaCode' => 'el', 'endonym' => "\x{395}\x{3bb}\x{3bb}\x{3b7}\x{3bd}\x{3b9}\x{3ba}\x{3ac}", 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Greek' }, { 'ianaCode' => 'he', 'endonym' => "\x{5e2}\x{5d1}\x{5e8}\x{5d9}\x{5ea}", 'isRtl' => $VAR1->{'talks'}[0]{'languages'}[1]{'isRtl'}, 'languageName' => 'Hebrew', 'languageCode' => 'he' }, { 'endonym' => 'Magyar', 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Hungarian', 'ianaCode' => 'hu', 'languageCode' => 'hu' }, { 'languageCode' => 'id', 'languageName' => 'Indonesian', 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => 'Bahasa Indonesia', 'ianaCode' => 'id' }, { 'languageCode' => 'it', 'ianaCode' => 'it', 'endonym' => 'Italiano', 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Italian' }, { 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => "\x{65e5}\x{672c}\x{8a9e}", 'languageName' => 'Japanese', 'ianaCode' => 'ja', 'languageCode' => 'ja' }, { 'languageCode' => 'ko', 'ianaCode' => 'ko', 'endonym' => "\x{d55c}\x{ad6d}\x{c5b4}", 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Korean' }, { 'languageCode' => 'lv', 'ianaCode' => 'lv', 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => "Latvie\x{161}u", 'languageName' => 'Latvian' }, { 'endonym' => "Lietuvi\x{173} kalba", 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Lithuanian', 'ianaCode' => 'lt', 'languageCode' => 'lt' }, { 'ianaCode' => 'fa', 'languageName' => 'Persian', 'isRtl' => $VAR1->{'talks'}[0]{'languages'}[1]{'isRtl'}, 'endonym' => "\x{641}\x{627}\x{631}\x{633}\x{649}", 'languageCode' => 'fa' }, { 'languageCode' => 'pl', 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => 'Polski', 'languageName' => 'Polish', 'ianaCode' => 'pl' }, { 'languageCode' => 'pt', 'ianaCode' => 'pt', 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Portuguese', 'endonym' => "Portugu\x{ea}s de Portugal" }, { 'endonym' => "Portugu\x{ea}s brasileiro", 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Portuguese, Brazilian', 'ianaCode' => 'pt-BR', 'languageCode' => 'pt-br' }, { 'languageCode' => 'ro', 'languageName' => 'Romanian', 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => "Rom\x{e2}n\x{103}", 'ianaCode' => 'ro' }, { 'languageCode' => 'ru', 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Russian', 'endonym' => "\x{420}\x{443}\x{441}\x{441}\x{43a}\x{438}\x{439}", 'ianaCode' => 'ru' }, { 'ianaCode' => 'sk', 'endonym' => "Sloven\x{10d}ina", 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Slovak', 'languageCode' => 'sk' }, { 'ianaCode' => 'es', 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => "Espa\x{f1}ol", 'languageName' => 'Spanish', 'languageCode' => 'es' }, { 'ianaCode' => 'sv', 'endonym' => 'Svenska', 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Swedish', 'languageCode' => 'sv' }, { 'ianaCode' => 'tr', 'isRtl' => $VAR1->{'precontrol'}, 'languageName' => 'Turkish', 'endonym' => "T\x{fc}rk\x{e7}e", 'languageCode' => 'tr' }, { 'ianaCode' => 'uk', 'languageName' => 'Ukrainian', 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => "\x{423}\x{43a}\x{440}\x{430}\x{457}\x{43d}\x{441}\x{44c}\x{43a}\x{430}", 'languageCode' => 'uk' }, { 'languageCode' => 'vi', 'languageName' => 'Vietnamese', 'isRtl' => $VAR1->{'precontrol'}, 'endonym' => "Ti\x{1ebf}ng Vi\x{1ec7}t", 'ianaCode' => 'vi' } ], 'speaker' => 'Tim Berners-Lee', 'postAdDuration' => '0.83', 'filmed' => 1265798100, 'targeting' => { 'tag' => 'Internet,TED Conference,computers,statistics,visualizations,web', 'talk' => 'tim_berners_lee_the_year_open_data_went_worldwide', 'id' => 788, 'event' => 'TED2010', 'year' => '2010' }, 'adDuration' => '3.33', 'id' => 788, 'name' => 'Tim Berners-Lee: The year open data went worldwide', 'isSubtitleRequired' => $VAR1->{'precontrol'}, 'nativeDownloads' => { 'medium' => 'http://download.ted.com/talks/TimBernersLee_2010U.mp4?apikey=489b859150fc58263f17110eeb44ed5fba4a3b22', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-light.mp4?apikey=489b859150fc58263f17110eeb44ed5fba4a3b22', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p.mp4?apikey=489b859150fc58263f17110eeb44ed5fba4a3b22' }, 'subtitledDownloads' => { 'tr' => { 'name' => 'Turkish', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-tr.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-tr.mp4' }, 'nl' => { 'name' => 'Dutch', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-nl.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-nl.mp4' }, 'lt' => { 'name' => 'Lithuanian', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-lt.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-lt.mp4' }, 'sv' => { 'name' => 'Swedish', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-sv.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-sv.mp4' }, 'zh-tw' => { 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-zh-tw.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-zh-tw.mp4', 'name' => 'Chinese, Traditional' }, 'pt' => { 'name' => 'Portuguese', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-pt.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-pt.mp4' }, 'ja' => { 'name' => 'Japanese', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-ja.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-ja.mp4' }, 'ro' => { 'name' => 'Romanian', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-ro.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-ro.mp4' }, 'ko' => { 'name' => 'Korean', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-ko.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-ko.mp4' }, 'pt-br' => { 'name' => 'Portuguese, Brazilian', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-pt-br.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-pt-br.mp4' }, 'lv' => { 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-lv.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-lv.mp4', 'name' => 'Latvian' }, 'bg' => { 'name' => 'Bulgarian', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-bg.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-bg.mp4' }, 'ar' => { 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-ar.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-ar.mp4', 'name' => 'Arabic' }, 'sk' => { 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-sk.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-sk.mp4', 'name' => 'Slovak' }, 'he' => { 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-he.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-he.mp4', 'name' => 'Hebrew' }, 'cs' => { 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-cs.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-cs.mp4', 'name' => 'Czech' }, 'vi' => { 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-vi.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-vi.mp4', 'name' => 'Vietnamese' }, 'el' => { 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-el.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-el.mp4', 'name' => 'Greek' }, 'ru' => { 'name' => 'Russian', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-ru.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-ru.mp4' }, 'sq' => { 'name' => 'Albanian', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-sq.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-sq.mp4' }, 'fr' => { 'name' => 'French', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-fr.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-fr.mp4' }, 'uk' => { 'name' => 'Ukrainian', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-uk.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-uk.mp4' }, 'id' => { 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-id.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-id.mp4', 'name' => 'Indonesian' }, 'de' => { 'name' => 'German', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-de.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-de.mp4' }, 'zh-cn' => { 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-zh-cn.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-zh-cn.mp4', 'name' => 'Chinese, Simplified' }, 'en' => { 'name' => 'English', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-en.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-en.mp4' }, 'es' => { 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-es.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-es.mp4', 'name' => 'Spanish' }, 'pl' => { 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-pl.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-pl.mp4', 'name' => 'Polish' }, 'fa' => { 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-fa.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-fa.mp4', 'name' => 'Persian' }, 'it' => { 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-it.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-it.mp4', 'name' => 'Italian' }, 'hu' => { 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-hu.mp4', 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-hu.mp4', 'name' => 'Hungarian' }, 'hr' => { 'low' => 'http://download.ted.com/talks/TimBernersLee_2010U-low-hr.mp4', 'high' => 'http://download.ted.com/talks/TimBernersLee_2010U-480p-hr.mp4', 'name' => 'Croatian' } }, 'event' => 'TED2010', 'streamer' => 'rtmp://cp358131.edgefcs.net/ted', 'introDuration' => '11.82', 'audioDownload' => 'http://download.ted.com/talks/TimBernersLee_2010U.mp3?apikey=489b859150fc58263f17110eeb44ed5fba4a3b22', 'shareUrl' => 'http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide', 'canonical' => 'http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide' } ] };
There is lots of interesting data in that JSON dump that we might be able to use to build nice applications.
TODO
In order to be able to implement either of the ideas for TED I'll also need to fetch a large list of talks, but let's leave that for another day and another article.
Published on 2016-01-31