In this week’s week roundup of browser news, a trick for loading images conditionally using the picture element, your chance to tell bowser vendors about the web you want, and the styles applied to inline SVG elements are, well, not scoped only to that SVG.
Let’s turn to the headlines…
Preventing image loads with the picture element
You can use the element to prevent an image from loading if a specific media query matches the user’s environment (e.g., if the viewport width is larger or smaller than a certain length value). [Try out the demo:
The Web We Want (webwewant.fyi) is a new collaboration between browser vendors that aims to collect feedback from web developers about the current state of the web. You can submit a feature request on the website (“What do you want?””) and get a chance to present it at an event (An Event Apart, Smashing Conference, etc.).
Firefox supports a non-standard Boolean parameter for the location.reload method that can be used to hard-reload the page (bypassing the browser’s HTTP cache) [via Wilson Page]
If you use inline elements that itself have inline CSS code (in elements), be aware that those styles are not scoped to the SVG element but global, so they affect other SVG elements as well [via Sara Soueidan]
XSS Auditor, a Chrome feature that detects cross-site scripting vulnerabilities, has been deemed ineffective and will be removed from Chrome in a future version. You may still want to set the HTTP X-Xss-Protection: 1; mode=block header for legacy browsers [via Scott Helme]
Read more news in my new, weekly Sunday issue. Visit webplatform.news for more information.
Are you planning to take the Project Management Professional (PMP) exam soon?
Getting your PMP certification will help you get a step ahead of the competition, whether you’re an aspiring or a current project manager. We’ve lined up the most useful resources we can find to help you prepare–and ace–the big test.
1. PMP Exam Guides
The PMP Exam Guides is one of the most comprehensive resources you can find, to help you jumpstart your review. It contains a lot of PMP terms, concepts, and topics you’ll need to answer the PMP exam questions.
2. PMP Exam Preparation Tool
The PMP Exam Preparation Tool is based on the Project Management Body of Knowledge (PMBOK) Guide, 6th edition. It has a training program that contains study notes, a PMP formula guide, flash cards, practice questions–and more–to help you prepare for the big test. Best of all, you can get this PMP exam prep tool online.
3. Earned Value Management
Measuring a project’s performance is a key skill for current and future project managers. To do this, you must learn Earned Value Management (EVM). It’s more than just a set of formulas; EVM is a process of finding inconsistencies between the work that’s been done and the project plan. EVM provides qualitative data crucial for making decisions.
4. PMP Formula Guide
Speaking of formulas, the PMP Formula Guide by Mohammad Usmani is one of the best PMP exam prep materials out there. It provides project managers with an easy way to understand the many PMP formulas and math-based questions in the test.
5. Pass the Exam on Second Attempt
If you failed to pass the PMP exam, the first time around, read this book by Mohammad Usmani. Titled “Pass the Exam On Your Second Attempt”,.this book will guide you to the right way to prepare for the exam, analyze your weakness, and solve the questions in different scenarios.
6. Learn the PMBOK Guide
PMBOK is an essential tool to help you create your study plan. It initially started out as a collection of common terms, phrases, and definitions in project management. Later, the guide was developed to include a new knowledge area for the current trends in processes and terminology. This makes it an important starting point for project managers.
7. PMP Exam Practice Kit
Taking the PMP exam can be expensive. According to project management, the PMP exam costs a hefty USD555.00. That’s why the PMP Exam Practice Kit is a must-have for project managers, not only to get a feel for the test, but also to save money. You can get a PMP exam prep PDF kit to try things, first-hand.
8. PM 4 Girls
PM 4 Girls provides free to low-cost training for the PMP exam. There are online tutorials, tips, training sessions, and a whole lot more of other resources for you to explore to ace the exam.
9. PMHub
As the name might suggest, PMHub is a hub of almost everything that you could possibly need to prepare for the exams. You can spend countless hours digging through this website to uncover hundreds of tips and tricks, tools, lessons learned, and even PMP certification study materials in PDF format.
10. Bright Hub PM
Bright Hub PM is your online resource for the PMP exams. It’s a website that contains articles about the methodologies, templates, and tips that can be used for the exam. It also has reviews of various project management software, as well as tips and tricks that you can try.
11. PMP Forums
Want to mingle with your PMP certified managers? Log on to PMP Forums. Discuss the common issues you face, as you prepare for the exam; learn study tips from other examinees; try answering PMP exam questions; and share tips and tools.
12. PMZilla
Another forum you might want to visit is PMZilla. Here you can prepare for the PMP examination with the help of its PMP exam bank, tips and tricks, and even go beyond with real-world issues and best practices in the industry.
13. PMP Best Group on Facebook
Almost everyone is on social media nowadays. So why not join a Facebook group for project managers like you who want to take the exam. PMP Best Group is one of these social media groups that you could join. You can ask for tips, tricks, and review materials that you could use as you prepare for the test.
14. PMP StudyGroups
Joining study groups may give you the edge when it comes to preparing for the PMP test. Reviewing with other people will help you learn faster, fill in gaps in your notes, break monotony, and hone your people skills, according to the Angeles Institute.
15. PMP FAQs from PMI
The Project Management Institute’s frequently asked questions (FAQs) will fill you in on the common queries of project managers who would like to take the exam for the first time. It will help paint a picture of how the test will go and how to prepare for it, as well as give some tips and tricks to use.
16. Project Smart
Need more guidance on how to learn the EVM concept? Take a look at Project Smart’s resources for EVM. Aside from its definition, you will also get to learn the basic formulas and how to solve problems with the EVM concept.
17. The PMBOK Guide
The PMBOK Guide is a book that has the foundational standards of becoming a project manager. It is here where most of the resources you would see online are based, which makes it a must have for project managers who want to take the test.
18. Head First PMP
This book is presented in a unique way, containing puzzles, games, problems, and exercises to help prepare you for the PMP exam. Additionally, it will teach you the latest principles from the PMBOK guide.
19. Electric Cloud WIKI
Electric Cloud is a project management platform with resources such as planning and management tools, automation, and the release process. This is perfect for project managers who need to know the ins and outs of the process of the job that can help them as they study for the PMP certification exams.
20. PMP Mock Exams
Want to know how tough a PMP exam can get? Then why not try answering a plethora of mock PMP exams? These are one type of exam that can never be leaked, as they change on a yearly basis, to stay updated on the latest questions and project management trends.
The PMP exams is a tough test. If you want to increase your chances of passing the exams, check out these most useful resources when preparing the PMP Exam. Good luck!
I’m in love with SVG. Sure, the code can look dense and difficult at first, but you’ll see the beauty in the results when you get to know it. The bonus is that those results are in code, so it can be hooked up to a CMS. Your designers can rest easy knowing they don’t have to reproduce an effect for every article or product on your site.
Today I would like to show you how I came up with this glass text effect.
Step 0: Patience and space
SVG can be a lot to take on, especially when you’re just starting to learn it (and if you are, Chris’ book is a good place to start). It’s practically a whole new language and, especially for people who lack design chops, there are lots of new techniques and considerations to know about. Like HTML, though, you’ll find there are a handful of tools that we can reach for to help make SVG much easier to grasp., so be patient and keep trying!
Also, give yourself space. Literally. SVG code is dense so I like to use two or three new lines to space things out. It makes the code easier to read and helps me see how different pieces are separated with less visual distraction. Oh, and use comments to mark where you are in the document, too. That can help organize your thoughts and document your findings.
I’ve made demos for each step we’re going to cover in the process of learning this glass effect as a way to help solidify the things we’re covering as we go.
OK, now that we’re mentally prepared, let’s get into the meat of it!
Step 1: Get the basic image in place
First things first: we need an image as the backdrop for our glass effect. Here we have an element and an within it. This is similar to adding an in HTML. You’ll notice the dimensions of the viewBox attribute and element in the SVG element are the same. This ensures that the is exactly the same size as the actual picture we’re linking to.
That’s a key distinction to note: we’re linking to an image. The SVG file itself does not draw a raster image, but we can reference one in the SVG code and make sure that asset is in the location we point to. If you’ve worked with Adobe InDesign before, it’s a lot like linking to an image asset in a layout — the image is in the InDesign layout, but the asset itself actually lives somewhere else.
Straightforward so far, but this is where things get complicated because we’re going to add a filter to the image we just inserted. This filter is going to distort the image. If you look closely at the difference between the demo in the last step and the one in this step, you’ll see that the edges of objects in the image are a little rough and wavy. That’s the filter at work!
First, we create another to hold filter. This means that if we ever want to reuse our filter — for example on multiple elements on the page — then we totally can!
Our first filter (#displacement) is going to distort our image. We’re going to use feTurbulence and feDisplacementMap, each explained by Sara Soueidan much better than I can in this post. Beau Jackson also wrote up a nice piece that shows how they can be used to make a cloud effect. Suffice to say, these two filters tend to go together and I like to think of them as when something needs to appear “wobbly.”
With our filter container in place, we just need to apply that filter to our image with a filter attribute on the , magic!
We don’t want the entire image to be distorted though. We’re going to clip the shape of our distorted to the shape of some text. This will essentially be the portion of the picture seen “through” the glass.
To do this, we need to add a element in a and give it an id. Calling this id in the clip-path of our now restricts its shape to that of our . Wonderful!
OK, so it’s bueno that we have the distorted clipped to the , but now the rest of the image is gone. No bueno.
We can counteract this by adding a copy of the same but without the clip-path or filter attributes before our existing . This is where I like to add some nice comments to keep things neat. The idea is like placing a transparent layer over what we have so far.
I know, I know, this isn’t very neat, and we’re repeating ourselves. Ideally, we would set our filter straight on the element and use the in="BackgroundImage property for feDisplacementMap to warp what’s behind the text, without the need for extra elements. Unfortunately, this has poor browser support, so we’re going to go with multiple images.
Next, we’re going to duplicate our text just as we did for the image in the last step. Unfortunately, because the text is in a clip-path, it’s now not available for rendering. This is the last time we’re going to duplicate content like this, I promise!
Now we should have something that looks like a normal image with black text over it. If the distortion filter on the we’ve already made is what we can see “through” the glass, then our new is going to be the glass itself.
<svg>
<!-- more stuff -->
<!-- TEXT - clipped -->
<clipPath id="clip">
<text x="50%" y ="50%" dominant-baseline="middle" text-anchor="middle">KYOTO</text>
</clipPath>
<!-- TEXT - visible -->
<text x="50%" y ="50%" dominant-baseline="middle" text-anchor="middle">KYOTO</text>
<!-- more stuff -->
</svg>
This is where things start to get exciting, at least for me! ?
We want to create a dark edge along the text element which, when paired with a light edge (we’ll look at that next), will add depth to the appearance of the text against the image.
We want a new filter for our , so let’s create one in our filter’s SVG element and give it an id="textFilter and link it to the filter attribute of the element.
SVG works from the background to the foreground, so the first thing we’re going put in our filter is the shadow that the glass would have, as that is furthest back. I’m gonna level with you, this one is pretty complex, but we’re going to go through it one step at a time.
For this effect, we’re using four filter primitives: feMorphology, feOffset, feFlood and feComposite.
feMorphology is first. We’re using this to make the text fatter. In the demo below, comment out the next three primitives ( feOffset, feFlood, feComposite ) and play with it. I have the value radius="4" to achieve the glass effect, but see what happens if you set it to 1… or 100!
feOffset is used to move all the “pixels” in the previous primitive ( feMorphology ) across the x- or y-axis. The values dx="5" and dy="5" move the “pixels” right on the x-axis and y-axis, respectively. The higher the number, the further they move. Put in negative numbers for dx and the “pixels” will move left. Negative dy and they’ll move up! Again, the is the sort of thing you start to learn as you play around with them.
The reason I have quotes around “pixels” is because they’re not screen pixels like you might expect in CSS. Rather, they refer to the dimensions we set on the parent . I think of them as percentages. We have used these settings viewBox="0 0 1890 1260" in our example. This means our is 1890 “pixels” wide. If we set dx="189" it means we’ll move our element 10% of the way across the SVG (1890 divided by 189).
feFlood is great. If you want to fill the screen with color, this is the primitive you need! You might wonder why we can’t read our text now when we apply it. That’s because you can only see the result of the last filter primitive that was created. The result of each of the previous primitives was related to our element. The result of feFlood is just like its name: a flood of color. It doesn’t know what you did before and it doesn’t care — it’s merely going to fill an area with color.
This is where some people start getting frustrated with SVG. It’s hard to work on something when you can’t see it! Trust me, as you work with SVG more you’ll get used to this. In fact, the next few steps will need us to rely on this and trust that everything is still in place.
feComposite is going to solve this issue for us. What does it do? MDN describes it as:
The SVG filter primitive performs the combination of two input images pixel-wise in image space using one of the Porter-Duff compositing operations: over, in, atop, out, xor, and lighter.
That to me is jibba-jabba. I think of it as affecting the alpha layer of in with the color/alpha of in2.
With this in place we can once again see our text spelled out and, because the color we used is slightly transparent, we can even see the distorted “glass” effect coming through. Great!
This is essentially the same as what we literally just did, but we’re going to shift the shape up and to the left using negative dx/dy values. We’re also setting a slightly white color this time. We’re aiming for a nice depth effect.
We’re again in a position where what we can see is the most recent result from a filter primitive, but we can’t see our dark edge! feComposite isn’t what we want to use to bring them together because we don’t want the alpha of the dark edge colored by the light edge… we want to see both! Which leads us to…
feMerge! It’s a hero. It lets us take any number of primitive results and merge them, making a new image. Woohoo, we can now see both dark and light edges together!
However, we do want them to be edges rather than both filling up the entire text, so we need to remove the space that the original takes up. What we need next is another feComposite to chop out the original SourceGraphic. Because we used feMorphology to fatten the letters for our edges, we can now chop the original letter shapes out of the result of our feMerge.
Now we’re starting to look like glass, with just one piece missing.
Step 9: Yes, a bevel
We have a pretty good 3D-looking glass effect. However, the letters look flat. Let’s add one more effect and make them look more rounded.
To achieve this we’re going to create a bevelled effect.
First we’re going to use feGaussianBlur. This will blur our existing filters slightly. We’re going to use this blurred result as basis to add some feSpecularLighting. As usual, feel free to play with the numbers here and see what effects you can get! The main one you might want to change is the lighting-color attribute. The image that we’re using here is slightly dark, so we’re using a bright lighting-color. If your image was very bright, this would make the letters hard to read, so you might use a darker lighting-color in that case.
Web design and development is a constantly evolving field and that change means you need tools and resources that you can rely on.
Building and maintaining websites that properly service both the client and the end user can be a complicated endeavor. Therefore, you want to have a strong foundation of web development tools ready to assist you in getting a project up and running. Here are five of the most fundamental web development and search engine optimization resources you should be utilizing.
Google Analytics
The basic goal of a website is to act as a digital storefront, a repository of all the information necessary for a business to make their presence known on the Internet. Whether the site wants to generate leads, increase awareness or sell products, the underlying aim is to get visitors to your website. Google Analytics offers you unparalleled insight into who is visiting your website, what they are doing once they get there and how they connected with the site in the first place.
Important information such as the bounce rate for the website as a whole and for individual pages, how long users tend to spend on areas of the site and the path users take through it is easily accessible thanks to this powerful data management and digital marketing tool. Google Analytics lets you know which pages resonant with visitors, what referral links are bringing in users and what times of day result in the highest traffic. All of this is accessible through an intuitive and clean user interface.
The best part is that Google Analytics is free for the standard version and you get access to Google’s massive cloud infrastructure to ensure your website’s data is organized and up to date. Google Analytics enables you to make smart and informed decisions about every aspect of your site and business.
Google Fonts
How a website looks and feels is critical to how well users respond and engage with it. When people read your site, you want the experience to be immediately informative and pleasant. Nowhere is this more obvious than with the choice of fonts. Google Fonts is a library of 916 free licensed fonts that allow you to pick the perfect fonts for your website.
Thanks to the interactive web directory, sorting and browsing this vast library is simple and effortless. Google Fonts is easy to integrate into any web project. With the ability to be used in over 135 languages, organizing and streamlining the typography of your website has never been so easy.
Google Search Console
It is important to be able to see how your website is performing in search results and being able to access the tools you’ll need to adapt to the changing world of web development. Luckily, the perfect, free resource to accomplish these goals exists. Google Search Console, formerly known as Google Webmaster Tools, gives developers the power to track every aspect of a website.
Google Search Console enables you to easily index every part of your website, check for coverage issues such as crawl errors, and maintain the site without worrying about it impacting the search results. Google Search Console also keeps you abreast of any potential security issues and whether your pages or site has been hit with a manual action, Google’s way of penalizing the search rankings of sites they suspect of spammy or suspicious behavior. It also offers to evaluate how your site performs on a mobile device so you can be sure that users have a good experience regardless of what device they are using.
Google Tag Manager
You’ll want to know how effective your site is and what visitors are doing, so setting up marketing tags and tracking pixels is of the utmost importance. Thanks to Google Tag Manager, you can easily implement and manage all your existing and future tags in one accessible location. Tag Manager makes tracking and analyzing the effectiveness of your site and the actions of your users much easier than it would be otherwise and without any need for developers to directly modify a site’s code. And best of all, it’s free.
Lorem Ipsum Generator
A site’s layout is critical to producing a smooth end user experience. In particular, knowing how the future text will look enables you to design web pages that are attractive and draw attention to the right areas. A Lorem ipsum generator allows you to quickly and easily place dummy text to see how a page will end up structured while you create the content that will eventually fill the page. This will save you time and energy in the long run while building web pages.
Web Design and Development Made Easier
Designing a website and then making it flourish is not a simple task. A lot of work has to go into how it looks, how it runs, how it ranks and how people will interact with it. You’ll want as many web development resources as possible to make the process smoother and more streamlined. The aforementioned tools are all free to use yet powerful in what they allow you to accomplish.
An Event Apart juuuuust wrapped up its Washington D.C. event yesterday. We hope we got to see you at the event but if not, perhaps we’ll see you at the next one happening Aug. 28-28 in Chicago.
Why would you go, you might ask? It’s three days of experts imparting their knowledge on topics ranging from CSS Houdini to intrinsic layouts — and that’s just the first day!
Seriously, there are lot of reasons why you’d want to go. The speakers are top-notch, the opportunities to network with others will be aplenty and you’ll be upping your front-end development chops the entire time. Not a bad collection of perks, for sure.
The time to register is now and, when you do, use coupon code AEACP at checkout and get $100 off the price!
This two-part series presents three projects that teach you how to use AWS (Amazon Web Services) to transform text between its written and spoken states. The first project will use text to speech to turn a blog post or other written content into a spoken .mp3 file to give more options to blind and dyslexic users of your site.
In the next article, we will embark on the return journey, from speech to text, and consider the accuracy of these transcriptions by sending various samples through a round-trip translation. To follow these tutorials, you will need an AWS account with billing enabled, though the tutorials will stay well within the constraints of free-tier resources. Examples will focus on using the AWS console, but I will also demonstrate the AWS CLI (Command Line Interface), which requires basic command line knowledge.
Introduction And Motivation
Most of the internet is text-based. Text is lightweight (1 byte per letter), widely supported, easy to interpret, and has a precedent as old as the internet as the default medium of online communication. Sending written text predates the internet: telegraphs carried text over wires hundreds of years ago and physical mail has transmitted writing for centuries. Voice transmission over radio and telephone also predates the internet, but did not translate to the same foundational medium that text did online. This is in almost all cases a good thing, again, text is lightweight and easy to interpret compared to audio. However, transforming between voice and text can add powerful functionality to and improve the accessibility of a wide variety of applications.
It has always been possible to transform between audio and text, you can read a written speech or transcribe an oral sermon. Indeed, if we think back to the telegram, trained operators transcoded Morse Code messages to words. In each example, it has always been very labor intensive to move from speech to writing or back, even with specialized training and equipment. With a variety of cloud services, we can automate these processes to allow transitioning between mediums in seconds without any human effort, which expands the possible use cases.
The most obvious benefit of implementing appropriate text to speech and speech to text options is accessibility. A visually impaired or dyslexic user would benefit from a narrated version of an article, while a deaf person could become a member of your podcasting audience by reading a transcript of the show.
Text to Speech Project
Say you wanted to add narrated versions of every post to your blog. You could purchase a microphone and invest hours into recording and editing spoken renditions of each post. This would result in a superior listener experience, but if you want most of the benefit for only a couple of minutes and a few pennies per post, consider using AWS instead. If you are the sort of person who regularly updates and revises older or evergreen content, this method also helps you keep the spoken version up to date with minimal effort.
We will begin with text to speech using Amazon Polly. For simple exploration, AWS provides a graphical user interface through its online console. After logging in to your AWS account, use the “Services” menu to find “Amazon Polly” or go to https://us-east-1.console.aws.amazon.com/polly/home/SynthesizeSpeech.
Using the Polly Console
You can use the Amazon Polly console to read 3,000 characters (about 500 words) and get an audio stream or immediate download. If you need up to 100,000 characters (about 16,600 words) read, your only option is to have AWS store the result in S3 after it has finished processing, which can take a couple of minutes. At the time of writing, Amazon Polly does not support inputs of over 100,000 billable characters, if you want to convert a longer text like a book you will most likely have to do so in chunks and concatenate the audio files yourself.
A “billable character” is one that the service actually pronounces. Specifically, that means that SSML tags are not billable characters, which we will cover later. For your first year of using Amazon Polly, you get 5 million billable characters per month for free, which is more than enough to run the examples from this article and do your own experimentation. Beyond that, Amazon Polly costs four dollars per million billable characters at the time of writing, meaning that converting a standard-length novel would cost about two dollars.
The console also allows you to change the language, region, and voice of the reader. Though this article only covers English, at the time of writing AWS supports 21 languages and 29 distinct language-region pairs. While most regions only have one or two voices, popular ones like United States English have several options to chose between.
I often prefer to use the UK English voice “Brian.” To my American ears, the British accent covers some of the inflections in robotic speech and makes for a smoother listening experience. To be clear, Amazon Polly narrated text is very obviously read by a robot, but the resulting audio is quite listenable.
It is significantly better than the built-in reader that the MacOS say terminal command uses, and is comparable to the speech quality of voice assistants like Siri and Alexa.
Writing SSML
If you want full control over the resultant speech, you can take the time to tag your input with SSML. SSML (Speech Synthesis Markup Language) is a standardized language for representing verbal cues in text. Like HTML, XML, and other markup languages, it uses opening and closing tags. Amazon Polly supports SSML input, and tags do not count as “billable characters.” Alexa skills also use SSML for pre-programmed responses, so it is a worthwhile language to know.
The foundational tag, , wraps everything that you want read. Like HTML, use
to divide paragraphs, which results in a significant pause in the narration. Smaller pauses come from punctuation, and you always have the option to insert pauses of up to ten seconds with .
SSML provides , a very flexible tag that supports everything from pronouncing phone numbers to censoring expletives using the interpret-as argument. Consider the options from this tag with the following sample.
<speak>
Call 5551230987 by 11'00" PM to get tips on writing clean JavaScript.<break time="1s"/>
Call <say-as interpret-as="telephone">5551230987</say-as> by 11'00" PM to get tips on writing clean <say-as interpret-as="expletive">JavaScript</say-as>
</speak>
Further flexibility comes from the tag, which provides you with control over the rate, pitch, and volume of speech. Unfortunately, at the time of writing Polly does not support the tag, which Alexa skills can use to speak in multiple standard voices, but does support the tag that allows voices in one language to correctly pronounce words from other languages. In this example, corrects the pronunciation of “tag” from American to German.
<speak>
Guten tag, where is the airport?<break time="1s"/>
<lang xml:lang="de-DE">Guten tag</lang>, where is the airport>
</speak>
Finally, if you want to customize pronunciation within a language, Amazon Polly supports the tag.
<speak>
Philip Kiely<break time="1s"/>
Philip <phoneme alphabet="x-sampa" ph="ˈkaI.li">Kiely</phoneme>
</speak>
This is not an exhaustive list of the customization options available with SSML. For a complete reference, visit the documentation.
Writing Lexicons
If you want to specify a consistent custom pronunciation or expand an abbreviation without tagging each instance with a phoneme tag, or you are using plain text instead of SSML, Amazon Polly supports lexicons of custom pronunciations. You can apply up to five lexicons of up to 4,000 characters each per language to a narration, though larger lexicons increase the processing time.
As with before, I want to make sure that Amazon Polly says my name correctly, but this time I want to do so without using SSML. I wrote the following lexicon:
The header and tag will stay mostly constant between lexicons, though the tag supports two important arguments. The first, alphabet, lets you choose between x-sampa and ipa, two standard pronunciation alphabets. I prefer x-sampa because it uses standard ASCII characters, so I am unlikely to encounter encoding issues. The xml:lang argument lets you specify language and region. A lexicon is only usable by a voice from that language and region.
The lexicon itself is a sequence of tags. Each one contains a tag, which contains the original text, and the tag, which describes what you want said instead. Aliases go beyond pronunciation, you can use them for expanding abbreviations (“Jr” becomes “Junior”) or replacing words (“Bruce Wayne” becomes “Batman”). A lexicon can have as many lexeme tags as it can fit in the 4,000 character limit.
The screenshot shows the plain text that would be mispronounced and the applied lexicon. Use the “Customize Pronunciation” menu to select up to five uploaded lexicons, uploaded from the left navbar tab “Lexicons.” Listening to the speech verifies that my name is said correctly.
Now that we have full control over the resultant speech, let’s consider how to save the output for use in our application.
Saving and loading from S3
If you want to re-use spoken text in your application, you’ll want to choose the “Synthesize to S3” option in the Amazon Polly console. In this example, I am using the voice “Brian” to perform a surprisingly capable reading of Shakespeare’s sonnet XXIX. We begin by copying in the poem as plain text and selecting “Synthesize to S3,” which launches the following modal.
S3 buckets have globally unique names, and you can enter any S3 bucket that you own or have the appropriate permissions to. Make sure the bucket allows for making its contents public, as that will be required in a future step. You should also set a “S3 key prefix,” which is a string that will help you identify the output in the bucket. After clicking Synthesize and giving it a moment to process, we navigate to the S3 bucket that we synthesized the speech into.
The arrow points to the entry in the bucket that we just created. Selecting that item will bring us to the following page.
Follow the arrow to select the “Make Public” option, which will make the file accessible to anyone with a link. Scroll down and copy the link and use it in your application. For example, you can download the poem here. For many applications, you may wish to pass the url to an html tag to allow for web playback.
We have covered every necessary component for transforming text to speech on AWS. Next, we turn our attention to a more advanced interface that can provide automation potential and save time.
Using the AWS CLI
Back to our hypothetical blog post. The simplest workflow would be to take the final written version of each article, copy it into the console, click the “Synthesize to S3 button,” and embed a download link to the resultant .mp3 file in the blog. Honestly, this is a pretty decent workflow; it is exactly what I do for my personal website. However, AWS offers another option: the AWS CLI.
Make sure that you have installed and configured the AWS CLI appropriately. Begin by entering aws polly help to make sure that Polly is available and to read a list of supported commands. For troubleshooting, see the documentation.
To perform a conversion from the command line, I first copied the poem from earlier into a .txt file. I then ran the following command in terminal (MacOS/Linux):
In a few seconds, the resulting .mp3 file was downloaded to my machine, ready for inclusion in my CMS or other application. Note the special characters around the --text argument, this passes the contents of the file rather than just the file name.
Finally, for more advanced applications, Amazon Polly has an SDK for 9 languages/platforms. The SDK would be overkill for these examples, but is exactly what you want for automating Amazon Polly calls, especially in response to user actions.
Conclusion
Text to speech can help you create more versatile, accessible content. Beginning in the Amazon Polly console, we can transform up to 100,000 billable characters in plain text or SSML, make the resulting .mp3 file public, and use that file in an application. We can use the AWS CLI for automation and more convenient access.
Stay tuned for the second installment of the series, we will convert media in the other direction, from speech to text, and consider the benefits and challenges of doing so. Part two will build on the technologies that we have used so far and introduce Amazon Transcribe.